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INTRODUCTION 


1  he  fact  that  actuarial  science  is  fundamentally 
a  branch  of  biology  rather  than  of  mathematics  is 
overlooked  far  more  generally  than  ought  to  be  the 
case.  Most  people,  even  those  of  education  and  wide 
culture,  are  inclined  to  look  upon  an  actuary  as  a 
particularly  crabbed,  narrow,  and  intellectually  dusty 
kind  of  mathematician.  In  reality  his  subject  is  one 
of  the  liveliest  in  the  whole  domain  of  biology,  and 
none  surpa.sses  it  in  its  practical  interest  and  import- 
ance to  mankind.  Because,  what  the  actuary  is,  or 
at  least  should  be,  trying  always  to  formulate  more 
and  more  definitely  are  the  laws  which  determine 
the  duration  of  human  life.  Why  the  actuary  in  fact 
is  too  often  intellectually  but  little  more  than  a  sort 
of  glorified  computer,  is  really  only  the  result  of  a 
defect  in  the  teaching  of  biology  in  our  colleges  and 
universities.  It  has  only  lately  come  to  be  recognized 
anywhere  that  a  biologist  needed  a  substantial  founda- 
tion in  mathematics  in  order  successfully  to  practise 
a  biological  profession.  It  is  not  too  rash  a  prediction 
to  say  that  presently  the  time  is  coming  when  no 
important  actuarial  post  will  be  held  by  a  mathe- 
matician who  knows  little  or  no  biology.  The  vigor 
and  originality  of  his  biological  outlook  will  be  valued 
as  highly  as  the  rigidity  of  his  mathematical  sub- 
structure now  Is. 
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The  thing  which  chiefly  makes  this  book  by  my 
friend  Ame  Fisher  notable,  Ues,  in  a  broad  sense, 
in  the  fact  that  it  is  a  highly  original  and  absolutely 
novel  essay  in  general  biology.  The  language  is  to  a 
considerable  extent  mathematical,  to  be  sure,  but  the 
subject  matter,  the  mode  of  logical  approach,  and  the 
significant  conclusion  —  all  these  are  pure  biology. 
Unfortunately  many  biologists  will  not  be  able  to 
appreciate  its  significance,  or  even  to  read  it  intel- 
ligently. But  this  is  their  loss,  and  at  the  same  time 
an  exposure  of  the  dire  poverty  of  their  intellectual 
equipment  for  dealing  with  the  problems  of  their 
science. 

There  ai*e  two  broad  features  of  Fisher's  work 
which  want  emphasis.  The  first  is  the  successful 
construction  of  a  life  table  from  a  knowledge  of  deaths 
alone.  That  the  construction  is  successful  his  results 
set  forth  in  this  book  eibundantly  demonstrate.  To 
have  done  this  is  a  mathematical  and  actuarial 
achievement  of  the  first  rank.  It  may  fairly  be 
regarded  as  fundamentally  the  most  significant  ad- 
vance in  actuarial  theory  since  Halley.  It  opens  out 
wonderful  possibilities  of  research  on  the  laws  of 
mortality,  in  directions  which  have  hitherto  been 
wholly  impossible  of  attack.  The  criterion  by  which 
the  significance  of  a  new  technique  in  any  branch  of 
science  is  evaluated,  is  just  this  of  the  degree  to  which 
it  opens  up  new  fields  of  research.  By  this  criterion 
Fisher's  work  stands  in  a  high  and  secure  position. 

But  of  vasUy  more  significance  considered  purely 
as  an  intellectual  achievement  is  his  discovery  of 
the  fundamental  biological  law  relating  the  several 
causes  of  death  to  each  other,  which  made  the  tech- 
nical accomplishment  possible.  More  than  one  accepted 


Introduction.  HI 

text  book  on  vital  statistics  has  scornfully  instructed 
ijs  readers  that  no  good  whatever  could  cQjcae  from 
any  tabulation  or  °^i^^ff iTf^flT^  rallg^  that  they  must 
^e'lC\T!Tde?ras^~iire'pestilence  by  any  statistician  who 
would  be  orthodox.  But  orthodoxy  and  discovery  are 
as  incompatible  intellectually  as  oil  and  water  are 
physically,  a  cosmic  law  often  overlooked  by  our 
'*  safe  and  sane"  scientific  gentry.  This  book  is  arf 
outstanding  demonstration  that  this  law  is  still  in 
operation,  Fisher  has  had  the  temerity  to  study  the 
ratios  of  deaths  from  one  cause  or  group  of  causes 
to  those  from  smother  group,  or  to  all  causes  together, 
and  has  discovered  that  there  abides  a  real  and 
hitherto  unsuspected  lawfulness  in  these  ratios.  Here 
again  his  pioneer  work  opens  out  alluring  vistas  to 
the  thoughtful  biometrican. 

Altogether  we  of  America  are  to  be  warmly 
congratulated  that  this  brilliant  Danish  mathematical 
biologist  h6Ls  chosen  to  come  and  live  with  us. 

Baltimore,  November  1921. 

Raymond  Pearl. 


AUTHOR'S  PREFACE 


1  he  classical  method  of  measuring  mortality  rests 
essentially  upon  the  fundamental  principles  first 
enunciated  by  the  British  astronomer,  Halley,  in  his 
construction  of  the  famous  Breslau  Life  Table.  Since 
the  time  of  Halley  this  method  has  been  so  thoroughly 
investigated  and  has  been  perfected  to  such  an  extent 
that  new  developments  along  this  line  cannot  be 
expected.  Any  improvements  on  the  original  principles 
of  Halley  are  after  all  nothing  but  refinements  in 
graduating  methods;  and  even  in  this  line  it  appears 
that  the  hmit  of  further  perfection  has  been  reached. 

Halley's  method,  which  is  purely  empirical  in 
scope  and  principle,  rests  primarily  upon  the  know- 
ledge of  the  number  of  persons  exposed  to  risk  at 
various  ages  and  the  correlated  number  .of  deaths 
among  such  exposures.  In  all  cases  where  such 
information  is  at  hand  the  old  and  tried  method  meets 
all  requirements  to  our  full  satisfaction;  and  it  would 
appear  superfluous  to  try  to  supplant  it  with  fun- 
damentally different  principles. 

In  presenting  the  new  method  outlined  in  this 
little  book  I  wish  to  state  most  emphatically  that  it 
has  never  been  my  intention  to  try  to  supersede  the 
conventional  methods  of  constructon  of  mortality 
tables  wherever  such  methods  are  applicable.  My 
proposed  method  is  only  a  supplement  to  the  former 
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tools  of  statisticians  and  actuaries,  and  aims  to 
utilize  numerous  statistical  materials  to  which  the 
older  system  of  Halley  is  not  applicable.  The  idea, 
whether  it  is  new  or  not,  meets  in  reality  a  very 
frequent  need  in  mortality  investigation»r~It  is  a_well 

known    fact    that^_in the— determLnation— jiL_  certain 

-^statistical  ratiqs^U  is  easier  io-^et^rmine  the  nume- 
rator~llTan  the  denominator,  as  for  instance  iii'Ii£ 
or  STckfiess  assurance,  where  the  losses  can  be 
^ascertained  with  a  very  close  degree^  of  accuracj 
while  the-^ollection  of  persons^  exposed  4a  jisk—at 
van(5ug^^age&-  is  often  .dif ificult  to  obtain.„  Similar 
remarks  hold  true  in  the  case  of  numerous  statistical 
summaries  of  mortuary  records  as  published  in  most 
government  reports  on  vital  statistics.  The  desire  to 
utiHze  this  enormous  statistical  material  was  what 
led  me  to  try  the  proposed  method. 

In  principle  the  plan  is  fundamentally  different 
from  that  of  the  empirical  method  of  Halley,  inasmuch 
as  I  have  attempted  to  substitute  the  inductive 
principle  for  that  of  pure  empiricism. 

In  the  first  place,  I  consider  the  d^  curve,  or  the 
number  of  deaths  by  attained  ages  among  the 
survivors  of  an  original  cohort  of  say  1,000,000 
entrants  at  age  10,  as  being  generated  as  a  compound 
curve  of  a  limited  number  (say  8  or  less)  of  subsidiary 
component  curves  of  either  the  Laplacean-Charlier  or 
Poisson-Charlier  type. 

The  method  of  induction  now  consists  in  deter- 
mining the  constants  or  parameters  of  these  sub- 
sidieiry  curves.  These  parameters  fall  into  two 
separate  categories: — 

A.  The  statistical  characteristics  or  semi-invari- 
ants which  determine  the  relative  frequency  distribu- 
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tion  by  attained  age  at  death,  as  expressed  by  the 
mean,  the  dispersion,  the  skewness  and  the  excess 
of  each  subsidiary  or  component  curve. 

B.  The  areas  of  each  subsidiary  or  component 
curve. 

The  working  hypothesis  which  I  have  put  forward 
is  that  the  relative  frequency  distribution  of  deaths  by  at- 
tained ages,  classified  according  to  a  limited  number  of 
groups  (generally  8  or  less)  of  causes  of  death  among  the 
survivors  of  the  original  cohort  of  entrants,  tend  to  cluster 
around  certain  ages  in  such  a  way  that  it  is  possible  from 
biological  considerations  to  estimate  in  practice  with  a 
sufficiently  close  degree  of  approximation  the  statistical 
characteristics  or  semi- invariants  of  the  relative  frequency 
distributions  of  the  component  curves,  corresponding  to  a 
previously  chosen  classification  of  causes  of  death  (into  8 
or  less  subsidiary  groups). 

This  implies  briefly  that  I  suppose  it  is  possible 
from  biological  considerations  to  select  a  priori  the 
statistical  characteristics  of  the  category  as  mentioned 
above  under  A. 

Once  this  hypothesis  is  accepted  as  a  true  supposi- 
tion, the  areas  of  each  of  the  component  curves  can 
be  determined  by  purely  deductive  methods  (as  for 
instance  the  method  of  least  squares)  from  the 
observed    values    of    the   proportionate    death    ratios 

Rsix)    (:c  =  10,   11,   12,    100;  B   =h   II,    III, 

)    corresponding    to    the    groups    of   causes 

of  death. 

Thus  the  parameters  as  determined  in  this 
manner  exhaust  the  given  statistical  material,  i.e. 
the  observed  proportionate  death  ratios  R^  (x).  A 
mere  addition  of  the  subsidiar\'  or  component  curves 
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gives  us  then  the  compound  d^  curve  from  which  it 
is  an  easy  task  to  find  the  functions,  /^  and  q^. 

The  scheme  as  we  have  briefly  outlined  it  above 
is,  therefore,  not  a  cut-and-dried  doctrine  or  a  sort 
of  "mathematical  alchemy"  as  some  of  my  critics 
have  implied.  Nor  is  it  an  authoritative  or  infallible 
dogma.  The  keystone  upon  which  its  success  depends 
is  merely  a  working  hypothesis;  i.e.  a  temporary  or 
preliminary  supposition.  I  suppose  something  to  be 
true  and  try  to  ascertain  whether,  in  the  light  of  that 
supposed  truth,  certain  facts  fit  together  better  than 
they  do  with  any  other  supposition  hitherto  tried. 

The  validity  of  the  working  hypothesis  must,  in 
my  opinion,  be  proved  or  disproved  either  by 
independent  methods  and  principles  of  construction 
of  mortaUty  tai)les,  such  as  for  instance  the  empirical 
principle  of  Halley,  hitherto  exclusively  used  by  the 
actuaries,     or    through     additional     biological    studies.  ^ 


*  The  biological  basis  of  Mr.  Fisher's  working  hypothesis,  which  is 
of  far  greater  importance  than  the  purely  ancillary  mathematical  deduc- 
tion, has  apparently  been  overlooked  by  many  of  his  American  critics, 
such  as  Little,  Thompson  and  Carver.  Dr.  Carver  in  the  Proceedings 
of  tlie  Casualty  Actuarial  Society  of  America  (Vol.  VI,  page  357; 
remarks  that  "if  we  can  construct  a  table  from  death  alone  as  in  Proc. 
Vol.  IV,  and  by  dividing  these  deaths  by  7^,  determine  the  unenumer- 
ated  population  —  why  not  the  converse?" 

The  answer  to  this  remark  is  obvious.  In  the  case  of  mortuary 
records,  Fisher  considered  two  different  and  distinct  attributes,  namely 
I)  the  purely  quantitative  attribute  of  attained  age  at  death,  and  2)  the 
purely  biological  attribute  of  cause  of  death,  which  in  i^onj unction  with 
the  working  hypothesis  to  a  certain  extent  aims  to  replace  the  unknown 
exposures.  If  we  were  to  follow  Dr.  Carver's  facetious  suggestion  and,  to 
use  his  phrase,  "go  the  proposed  plan  one  better  by  using  enumerated 
populations  only",  we  should,  however,  encounter  a  statistical  series  with 
the  single  attribute  of  attained  age  only,  but  no  second  attribute  corres- 
ponding to  that  of  the  biological  factor  of  the  cause  of  death.    Criticisms 
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In  the  meantime  I  feel  justified  in  presenting  to 
my  readers  the  practical  results  obtained  by  this 
method,  which  although  perhaps  not  unimpeachable 
in  respect  to  mathematical  rigour,  neverthelees  in  my 
opinion  offers  a  means  to  attack  a  vast  bulk  of 
collected  statistical  data  against  which  our  former 
actuarial  tools  proved  useless.  The  celebrated  Russian 
mathematician  Tchebycheff,  once  made  a  remark  to 
the  effect  that  in  the  antique  past  the  Gods  proposed 
certain  problems  to  be  solved  by  man,  later  on  the 
problems  were  presented  by  halfgods  and  great  men, 
while  now  dire  necessity  forces  us  to  seek  some 
solution  to  numerous  practical  problems  connected 
with  our  daily  conduct.  The  problem  towards  which 
I  have  made  an  attempt  to  offer  a  sort  of  solution  in 
the  present  little  essay  is  one  of  these  numerous 
problems  of  dire  necessity  mentioned  by  Tchebycheff, 
and  I  hope  that  my  work  along  this  line,  imperfect 
as  it  is,  may  nevertheless  prove  a  beginning  towards 
more  improved  methods  in  the  sajne  direction. 

In  conclusion  I  wish  to  extend  my  thanks  to  a 
number  of  friends  and  colleagues  both  in  America 
and  Europe  and  Japan  who  have  kept  on  encouraging 
me  in  my  work  along  these  lines  in  spite  of  much 
adverse  criticism  from  certain  statistical  and  actuarial 
circles.  I  wish  in  this  connection  to  thank  Mr.  F.  L. 
Hoffman,  Statistician  of  the  Prudential  Insurance 
Company,  for  permitting  me  to  apply  the  method  to 
various  collections  of  mortuary  records  while  working 
as  a  computer  in  his  department.  My  thanks  are  also 


of  the  sort  of  Dr.  Carver's  brings  to  light  the  fundamentally  different 
principles  applied  by  Mr.  Fisher  in  sharp  contradistinction  to  the  purely 
empirical  methods  of  the  orthodox  actuary  and  statistician. 

Translator. 
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due  to  Mr.  E.  A.  Vigfusson  for  making  the  trans- 
lation from  my  rough  Danish  notes.  If  the  resulting 
English  is  perhaps  open  to  criticsm,  I  beg  to  remind 
the  reader  that  my  original  manuscript  was  written 
in  Danish  and  translated  into  English  by  £Ln  Icelander, 
while  the  composition  and  proof  reading  was  done 
by   a   Copenhagen   firm. 

To  Professor  Glover  of  the  University  of  Michigan 
I  also  wish  to  extend  my  thanks  for  inviting  me  to 
deliver  a  series  of  lectures  on  the  construction  of 
mortality  tables  before  his  classes  in  actuarial 
methods  during  the  month  of  March  1919.  This 
invitation  afforded  me  the  first  opportunity  to  bring 
the  proposed  method  before  a  professional  body  of 
statistical  readers. 

Last  but  not  least  I  desire  to  acknowledge  my 
obligations  to  Professor  Pearl  whose  introductory 
note  I  consider  the  strongest  part  of  the  book.  In 
these  departments  of  knowledge  the  appreciation  of 
one's  peers  is  sifter  all  the  only  real  reward  one  can 
possibly  expect.  The  fact  that  this  eminent  biologist 
has  recognized  that  the  nucleus  of  the  whole  problem 
is  of  a  purely  biological  nature,  and  that  the 
mathematical  analysis  is  merely  ancillary,  is 
particularly  pleasing  to  me,  because  it  represents  my 
own  view  in  this  particular  matter. 

p.  t.  Newark,  U.  S.  A.,  November  1921. 

Arne  Fisher. 
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During  the  spring  of  1919  the  attention  of  the 
present  writer  was  called  to  a  brief  paper  entitled 
Note  on  the  Construction  of  Mortality  Tables  by  means  of 
Compound  Frequency  Curves  by  the  Danish  statisticican, 
Mr.  Arne  Fisher.  The  novelty  and  originahty  of  this 
paper  impressed  me  to  such  an  extent  that  I  became 
desirous  of  obtaining  more  detailed  information  about 
the  process  than  that  which  necessarily  was  contained 
in  the  above  summary  note,  originally  printed  in  the 
Proceedings  of  the  Casualty  and  Acturial  Society  of 
America. 

I  wrote  therefore  to  Mr.  Fisher  ajid  inquired 
whether  he  intended  to  pubUsh  any  further  studies 
on  this  subject.  From  his  reply  I  learned  that  he  had 
delivered  a  series  of  lectures  on  this  very  topic  before 
Professor  Glover's  insurance  classes  at  the  University 
of  Michigan  during  the  month  of  March  1919,  but  that 
the  proposed  method  had  been  met  with  such  captious 
opposition  in  certain  actuarial  circles  that  he  had 
decided  to  abandon  the  plan  of  publishing  anything 
further  on  the  subject  and  had  even  destroyed  the 
Enghsh  notes  prepared  for  the  Michigan  lectures. 

In  the  meantime  the  proposed  scheme  had 
received  considerable  attention  in  actuarial  circles  in 
Europe  and  Japan  and  several  highly  commendatory 
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reviews  had  appeared  in  the  Enghsh  and  Continental 
insurance  periodicals  and  various  scientific  journals, 
notably  the  Journal  of  the  Royal  Statistical  Society  and 
the  Bulletin  de  V  Association  des  Actuaires  Suisses.  The 
proposed  method  seemed  indeed  so  novel  and  unique 
that  I  could  not  help  feeUng  that  it  deserved  a 
better  fate  than  that  of  being  forgotten.  I  sug- 
gested therefore  to  Mr.  Fisher  that  he  prepare  a 
new  manuscript.  But  unfortunately  his  time  did  not 
allow  this.  He  consented,  however,  to  turn  over  to 
me  his  original  Danish  notes  on  the  subject  from 
which  he  had  prepared  his  Michigan  lectures  and 
permitted  me  to  make  an  English  translation  for  the 
Scandinavian  Insurance  Magazine.  I  gladly  availed 
myself  of  this  opportunity  to  bring  this  fundamental 
work  before  an  international  body  of  readers  and 
started  on  the  translation  in  the  summer  of  1919. 

At  the  same  time  Mr.  Fisher  decided  to  put  the 
proposed  method  and  working  hypothesis  to  a  very 
severe  test,  which  would  meet  even  the  most  stringent 
requirements  of  some  of  his  critics  and  their  conten- 
tion that  the  method  would  fail  in  the  case  of  a 
rapidly  changing  population  group.  For  this  purpose 
he  selected  a  series  of  statistical  data  contained  in  the 
annual  reports  and  statements  of  a  number  of  the 
lesLding  Japanese  Life  Assurance  Offices,  relating  to 
their  mortuary  records  for  the  four  year  period  from 
1914—1917.  More  than  35,000  records  of  male  lives, 
arranged  according  to  the  Japanese  list  of  causes  of 
death  and  grouped  in  quinquennial  age  intervals 
formed  the  basis  for  the  construction  of  the  final 
life  teible  which  was  completed  in  November  1919. 
This  table,  which  like  Mr.  Fisher's  other  tables  was 
derived   without   anv   infoi^nation  of  the  number   of 
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lives  exposed  to  risk  at  various  ages,  is  shown  in  the 
addenda  of  this  treatise. 

Immediately  after  its  construction  Mr.  Fisher  sent 
this  tai)le  to  the  well  known  Japanese  actuary,  Mr. 
T.  Yano,  and  asked  him  for  an  opinion  regarding  the 
trustworthiness  of  the  final  death  rates  of  q^  as 
derived  by  his  new  method.  The  Japanese  actuary's 
answer  arrived  in  April  1920.  Mr.  Yano  had  after 
the  receipt  of  Mr.  Fisher's  letter  ascertained  the 
exposures  and  deaths  among  male  lives  at  each 
seperate  age  for  about  40  Japanese  life  offices  during 
the  period  1914 — 1917  and  constructed  by  means  of 
the  conventional  methods  a  complete  series  of  q^  by 
integral  ages  from  age  10  to  90.  These  ungraduated 
data  are  shown  as  a  broken  line  polygon  in  the 
appended  diagram  (Figure  1).  In  spite  of  the  fact  that 
Mr.  Fisher  had  no  information  whatever  about  the 
exposed  to  risk  the  agreement  of  the  continuous  curve 
of  q^  as  determined  by  the  frequency  curve  method 
with  Mr.  Yano's  ungraduated  data  is  so  close  that 
I  think  further  comments  superfluous.  The  shght 
differences  in  younger  ages  might  indeed  rise  from 
the  fact  that  Mr.  Yano  had  access  to  all  the  experience 
(containing  more  than  45,000  deaths)  of  all  the  Ja- 
penese  companies,  whereas  Fisher  only  used  the 
mortuary  records  as  published  by  some  of  the  leading 
Japanese  companies. 

Like  all  scientific  methods  of  induction  Mr.  Fi- 
sher's proposed  plan  rests  upon  a  working  hypothesis, 
namely  that  it  is  possible  from  biological  considera- 
tions to  group  the  deaths  among  the  survivors  at 
various  ages  in  any  mortality  table  according  to 
causes  in  such  a  manner  that  their  percentage  or 
relative  frequency  distribution  according  to  attained 
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age  at  death  will  conform  to  a  previously  selected 
system  or  family  of  Laplacean-Charlier  or  Poisson- 


ii 


Fig.  1. 


Charlier  frequency  curves.   Mr.  Fisher  himself  is  very 
frank  in  stating  that  this  is  a  working   hypothesis 
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upon  which  hinges  the  success  of  the  whole  method. 
One  of  the  main  objections  of  his  critics  is  that  it 
seems  impossible  to  prove  the  truth  of  this  working 
hypothesis.  Naturally  its  truth  cannot  be  proved  by 
mathematics  or  logic  any  more  that  we  can  prove 
or  disprove  the  existence  of  Euclidean  space,  which 
in  itself  constitutes  a  working  hypothesis  for  most 
of  our  applied  mathematics.  Mr.  Fisher's  critics  might 
as  well  be  asked  to  prove  or  disprove  Newton's 
hypothetical  laws  of  motion  and  attraction  as 
extended  by  Maxwell  and  Hertz,  or  the  newer 
hypothesis  recently  put  forwards  by  the  relativists, 
or  the  Lorentz  hypothesis  of  contraction.  It  would 
indeed  be  a  terriffic  blow  to  science  and  the  extension 
af  knowledge  if  it  was  required  that  no  working 
hypothesis  would  be  alloved  in  scientific  work  unless 
such  hypothesis  could  be  proved  to  be  true.  What 
position  would  biology  occupy  to-day  if  biologists  had 
insisted  that  Darwin's  great  hypothesis  be  proved 
before  it  could  be  alloiwed  as  a  foundation  in  the  study 
of  evolution? 

The  most  convincing  answer  to  Mr.  Fisher's 
captious  critics  among  the  old  school  of  actuaries 
and  statisticians  is,  however,  the  undisputed  fact  that 
his  working  hypothesis  as  such  really  does  work. 
As  pointed  out  by  Dr.  Pearl  in  the  introductory  note 
of  this  book  the  results  set  forth  in  the  present 
treatise  abundantly  demonstrate  this  fact.  The  6 
widely  different  mortality  tables  as  shown  in  the 
addenda  stand  as  mute  and  yet  as  the  most  eloquent 
evidence  to  the  fact  that  the  method  works.  It  might 
indeed  not  appear  impertinent  to  suggest  that  Mr. 
Fisher's  actuarial  critics  would  render  a  greater 
service  to  their  profession  by  proving  that  these  six 
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mortality  tables  cannot  be  considered  as  reasonable 
approximatons  to  tables  derived  by  orthodox  means 
from  the  same  population  groups  than  by  starting 
to  poohpooh  and  ridicule  his  proposed  method. 

Winnipeg,  Canada,  November  1921. 

E.  A.  Vigfusson. 


"Nothing  is  less  warranted  in  science  than  an  uninqui- 
ring  and  unhoping  spirit.  In  matters  of  this  kind,  those 
who  despair  are  almost  invariably  those  who  have  never 
tried  to  succeed." 

W.  Stanley  Jevons. 


CHAPTER  I 

(TRANSLATED  BY  MISS  DICKSON) 


AN  INTRODUCTION  TO  THE  THEORY  OF 
FREQUENCY  CURVES 

1.  INTRODUCTION  The  following  method  of  con- 
structing mortality  tables  from 
mortuary  records  by  sex,  age 
and  cause  of  death  rests  essentially  upon  the 
theory  of  frequency  curves  originally  introduced 
by  the  great  Laplace  and  of  recent  years  further 
developed  and  extended  through  the  elegant  and 
far  reaching  researches  of  the  Scandinavian  school 
of  statisticians  under  the  leadership  of  Gram, 
Charlier  and  Thiele  and  their  disciples.  This 
method  is,  however,  comparatively  little  known 
and  unfortunately  not  always  fully  appreciated 
by  the  majority  of  English  statisticians  and  ac- 
tuaries, who  prefer  to  apply  the  well  known 
methods  of  the  eminent  English  biometrician, 
Karl  Pearson.  For  this  reason  it  may  be  advisable 
to  give  a  preliminary  sketch  of  Charlier' s  methods 
so  as   to   obtain    a  better   understanding   of   the 
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following  chapters  dealing  with  the  more  specific 
problem  of  mortality  tables.  The  treatment  must 
necessaril}^  be  brief  and  represents  essentially  an 
outline  of  the  more  detailed  theory  which  I  hope 
to  present  in  my  forthcoming  second  volume  of 
the  Mathematical  Theory  of  Prohahilities. 

By  the  method  of  Charlier  any  frequency 
function  is  expressed  as  an  infinite  series  rather 
than  as  a  closed  and  compact  algebraic  or  tran- 
scendental expression  by  the  Pearsonian  methods. 
By  power  series  the  thoughts  of  the  majority  of 
students  are  associated  with  the  famous  series 
which  bear  the  names  of  Taylor  and  Maclaurin. 
In  these  series  the  function  is  derived  as  an  in- 
finite series  of  ascending  powers  of  the  inde- 
pendent variable  whose  coefficients  are  expressed 
by  means  of  the  correlated  successive  derivatives 
of  the  function  for  specific  values  of  f(x).  Thus 
for  instance  we  know  that  the  Maclaurin  series 
may  be  written  as  follows : 

fix)  =  m + 1-/'(0) + j|-r(0) + .  --^HO)  +  ■■• 

where  /^(O)  is  the  symbol  for  the  value  of  the  n** 
derivative  when  x  =  0  and  n  =  1,  2,  3,  4  .  .  .  .  n. 
There  are,  however,  contrary  to  the  belief  of 
many  immature  students,  only  comparatively  few 
functions  which  allow  a  rigorous  expansion  by  this 
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method,  in  which  the  derived  functions  and  the 
differential  calculus  play  the  leading  roles. 

But  on  the  other  hand  there  are  other  methods 
of  expansions  in  infinite  series  which  are  more 
general  and  by  which  the  coefficients  of  the  in- 
dependent variable  are  expressed  by  operations 
other  than  those  of  differentiation.  One  of  these 
methods  is  to  express  the  coefficients  as  definite 
integrals  either  of  the  unknown  function  itself  or 
some  auxiliary  function. 

The  range  of  practical  problems  which  lay 
themselves  open  to  a  successful  attack  along  those 
lines  is  much  wider  than  the  corresponding  range 
of  practical  problems  to  which  we  may  apply  the 
Taylor  series. 

Speaking  generally  as  a  layman  (who  continu- 
ously has  to  face  practical  rather  than  abstract 
problems)  and  specifically  as  a  mathematical 
novice  (who  considers  mathematics  as  a  means 
rather  than  as  an  end)  this  fgict  appears  to  me 
quite  obvious  from  a  purely  philosophical  point  of 
view.  In  nature  and  in  all  practical  observations 
we  encounter  finite  and  not  infinitesimal  quantit- 
ies. In  other  words,  what  we  actually  observe  are 
finite  sums  or  definite  integrals,  i.  e.  the  limit  of 
a  sum  of  infinitely  small  component  parts. 

The  definite  integral  rather  than  the  derivative 
and  the  differential  seems,   therefore,  to  be  the 

1* 
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more  elementary  and  primitive  operation  and  the 
one  which  suggests  itself  first  hand.  History  of 
Mathematics  indeed  proves  this  contention.  Ar- 
chimedes had  (as  shown  by  the  researches  of  the 
Danish  scholar,  Heiberg)  laid  the  essential  foun- 
dation for  an  integral  calculus  about  500  B.  C. 
And  nearly  25  centuries  later,  almost  simultane- 
ously with  the  historical  discovery  of  Heiberg  an- 
other Scandinavian,  the  Swedish  mathematician 
and  actuary,  Fredholm,  gave  to  the  world  his 
epochmaking  work  on  integral  equations.  Fred- 
holm's  monumental  memoir  "Sur  une  nouvelle 
methode  pour  la  resolution  du  prohlems  de  Dirich- 
let"  was  first  published  in  the  "Ofversigt  af  aka- 
demiens  forhandlinglar''  (Stockholm  1900).  Mea- 
sured by  time  the  subject  of  integral  equations  is 
thus  a  mere  infant  in  the  history  of  mathematical 
discoveries.  Measured  by  its  importance  it  has 
already  become  a  classic.  Its  application  to  a 
steadily  increasing  number  of  essentially  pra<3tical 
problems  in  almost  every  branch  of  science  has 
placed  it  in  a  central  position  of  modern  mathe- 
matical research  and  it  bids  fair  to  become  the 
most  important  branch  of  mathematics. 

Fredholm  in  introducing  his  now  famous  in- 
finite determinants,  known  as  the  Fredholmean 
determinants,  had  a  forerunner  in  the  Danish 
actuary,  Gram,  whose  Doctor's  dissertation  "Om 
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Kaekkeudviklinger  ved  de  mindste  Kvadraters  Me- 
tode"  (CopeDhagen  1879)  gave  prominence  to  a 
certain  class  of  functions  which  later  on  have 
become  known  as  orthogonal  functions,  and  by 
which  Gram  actually  gave  the  first  expansion  of 
a  frequency  distribution  or  frequency  curve  in 
an  infinite  series.  Scandinavians  in  general  and 
,  Scandinavian  actuaries  in  particular  may,  there- 
A-fore,  feel  proud  of  their  share  of  imparting  know- 
ledge on  this  important  subject,  which  makes  a 
strong  bid  to  place  mathematics  on  a  higher  plane 
than  ever  before,  not  alone  as  an  abstract  but 
equally  well  as  an  applied  science.  The  genius 
of  the  Italian  renaissance  Leonardo  da  Vinci,  as 
early  as  1479  proclaimed  "that  no  part  of  human 
knowledge  could  lay  claim  to  the  title  of  science 
before  it  had  passed  through  the  stage  of  mathe- 
matical demonstration".  Comparatively  few  bran- 
ches of  learning  measure  up  to  the  standard  of 
Leonardo  da  Vinci,  and  our  learned  friends  among 
the  economists  and  sociologists  have  a  long  road 
to  travel  before  they  succeed  in  placing  their 
methods  in  the  coveted  niche  of  science.  But  the 
new  vistas  of  possibilities  opened  up  to  them  by 
means  of  M.  Fredholm's  discovery  ought  to 
furnish  them  a  powerful  tool  towards  the  attain- 
ment of  the  high  standard  set  by  the  great  Italian. 
The  principal  theorems  of  integral  equations 
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are  bound  to  be  especially  fruitful  in  their  ap- 
plication to  mathematical  statistics  and  the  pro- 
blems of  frequency  curves  and  frequency  surfaces 
together  with  the  associated  problems  of  mathe- 
matical correlation. 

2.    FREQUENCY       K    N    succcssive    obscrvatious 

DISTRIBUTIONS  -    ■       j.-  o  i.u 

AND  origmatmg  from  the  same  es- 

FUNCTiONS  sential  circumstances  or  the 
same  source  of  causes  are  made  in  respect  to  a 
certain  statistical  variate,  x,  and  if  the  individual 
observations  o.  (i  =  l,  2,  3,  .  .  .  .  N)  are  permuted 
in  an  ascending  order  then  this  particular  per- 
mutation is  said  to  form  a  frequency  distribution 
of  X  and  is  denoted  by  the  symbol  F{x). 

The  relative  frequencies  of  this  specific  per- 
mutation, that  is  the  ratio  v^hich  each  absolute 
frequency  or  group  of  frequencies  bear  to  the 
total  number  of  observations,  is  called  a  relative 
frequency  function  or  probability  function  and  is 
denoted  by  the  symbol  cp(aj). 

If  the   statistical   variate   is   continuous   or   a 
graduated   variate,    such    as   heights   of   soldiers, 
ages  at  death  of  assured  lives,  physical  and  astro- 
nomical precision  measurements,  etc.,   then 
dz  cp  (z) 

is  the  probability  that  the  variate  x  satisfies  the 
follow  in  or  relation 
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z  —  -^dz  <  X  <  z-\-^dz 


or  that  X  falls  between  the  above  limits. 

If  the  statistical  variate  assumes  integral  (dis- 
crete) values  only  such  as  the  number  of  alpha 
particles  radiated  from  certain  metals  and  radio- 
active gases  as  polonium  and  helium,  number  of 
fin  rays  in  fishes,  or  number  of  petal  flowers  in 
plants,  then  9(2;)  is  the  probability  that  x  assumes 
the  value  z.  From  the  above  definitions  it  follows 
a  fortiori  that 

(a)  F{z)  =  'N^{z)  (Integral  variates) 

(b)  dz  F(z)  =N^>(z)dz      (Integrated  variates) 

Interpreting  the  above  results  graphically  we 
find  that  (a)  will  be  represented  by  a  series  of 
disconnected  or  discrete  points  while  (b)  will  be 
represented  by  a  continuous  curve. 

As  to  the  function  (p(z)  we  make  for  the 
present  no  other  assumptions  than  those  follow- 
ing immediately  from  the  customary  definition  of 
a  mathematical  probability.  That  is  to  say  the 
function  cp( 2;)  must  be  real  and  positive. 

Moreover  it  must  also  satisfy  the  relation 
+  00 

jj(p(z)rfz=  1, 

— 00 

or   in   the   case  of  discrete  variates : 
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i;"<p(z)  =  1 

Z  = 00 

which  is  but  the  mathematical  way  of  expressing 
the  simple  hypothetical  disjunctive  judgment  that 
the  variate  is  sure  to  assume  some  one  or  several 
values  in  the  interval  from  —  gc  to  +  oo.  The 
zero  point  is  arbitrarily  chosen  and  need  not  coin- 
cide with  the  natural  zero  of  the  number  scale. 
Thus  for  instance  if  we  in  the  case  of  height  of 
recruits  choose  the  zero  point  of  the  frequency 
curve  at  170  centimeters  an  observation  of  180 
centimeters  would  be  recorded  as  +10  and  an 
observation  of  160  centimeters  as  — 10. 

3.  PROPERTY    OF     In  regard  to  a  frequency  func- 

CONSTANTS  OR       ^-  •      • 

PARAMETERS  tion  wc  may  assume  a  priori 
that  it  will  depend  only  upon 
the  variate  x  and  certain  mathematical  relations 
into  which  this  variate  enters  with  a  number  of 
constants  Aj,  X^?  ^3?  ^4 >  symbolically  ex- 
pressed by  the  notation 

Fix,  \,  \o,  X3,  X,  .  .  .  .) 

where  the  X's  are  the  constants  and  x  the  variate. 
All  these  constants  or  parameters  are  naturally 
independent  of  x  and  represent  some  peculiar  pro- 
perties or  characteristic  essentials  of  the  frequency 


Property  of  Parameters.  9 

function  as  expressed  in  the  original  observations 

0.  (i  =  l,  2,  3, N).    We  may,  therefore, 

say  that  each  constant  or  statistical  parameter 
entering  into  the  final  mathematical  form  for  the 
frequency  function  is  a  function  of  the  observa- 
tions 0^.  This  fact  may  be  expressed  in  the  follow- 
ing symbolic  form  : — 

\  =  ^1  Kj  0^1  0.,  .  .  .  0^) 
'V=  ^-2  (Oi,   Oo,   O.J,  .  .  .  0^) 


But  from  purely  a  priori  considerations  we 
are  able  to  tell  something  else  about  the  function 
S  .  (i=l,  2,  3  ....  N).  It  is  only  when  per- 
muting the  various  o's  in  an  ascending  magnitude 
according  to  the  natural  number  scale  that  we 
obtain  a  frequency  function.  This  arrangement 
itself  has,  however,  no  influence  upon  any  one 
of  the  o's  which  were  generated  before  this  purely 
arbitrary  permutation  took  place.  The  ultimate 
and  previously  measured  effects  of  the  causes  as 
reflected  in  each  individual  numerical  observa- 
tions, 0.,  depend  only  upon  the  origin  of  causes 
which  form  the  fundamental  basis  for  the  stati- 
stical object  under  investigation  and  do  not  depend 
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upon  the  order  in  which  the  individual  o's  occur 
in  the  series  of  observations. 

Suppose    for    instance   that   the   observations 
occurred  in  the  following  order 


By  permuting  these  elements  in  their  natural  or- 
der we  obtain  the  frequency  distribution  F{x). 
But  the  very  same  distribution  could  have  been 
obtained  if  the  observations  had  occurred  in  any 
other  order  as  for  instance 


^'    ^N 


SO  long  as  all  of  the  individual  o's  were  retained 
in  the  original  records.  Or  to  take  a  concrete  ex- 
ample as  the  study  of  the  number  of  policyholders 
according  to  attained  ages  in  a  life  assurance 
office.  We  write  the  age  of  each  individual  policy- 
holder on  a  small  card.  When  all  the  ages  have 
been  written  on  individual  cards  they  may  be  per- 
muted according  to  attained  age  and  the  resulting 
series  is  a  frequency  function  of  the  age  x.  We 
may  now  mix  these  cards  just  as  we  mix  ordinary 
playing  cards  in  a  game  of  whist,  and  we  get  an- 
other permutation — in  general  different  from  the 
order  in  which  we  originally  recorded  the  ages  on 
the  cards.    But  this  new  permutation  can  equally 
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well  be  used  to  produce  the  frequency  function  if 
we  are  only  sure  to  retain  all  the  cards  and  do 
not  add  any  new  cards. 

4.  PARAMETERS-  The  vaHous  functions  S  (Oj, 
SYMMETRIC  Oj,  O3  .  .  .  .  .  0^)  are  there- 
FUNCTioNS  Iqyq^  symmetric  functions,  that 
is  functions  which  are  left  unaltered  by  arbitrarily 
permuting  the  N  elements  o,  and  no  interchange 
whatever  of  the  values  of  the  various  o's  in  those 
symmetric  functions  can  have  any  influence  upon 
the  final  form  of  the  frequency  function  or  fre- 
quency curve,  F(x). 

We  now  introduce  under  the  name  of  power 
sums  a  certain  well  known  form  of  fundamental 
symmetrical  functions  defined  by  the  following 
relations 


Sq  =  o[-{-ol  +  ol+  ., 

.  .  0%. 

=  N 

^1  =o]  +  ol  +  ol+.. 

■ol- 

^Z") 

s^  =  ol-^ol  +  ol+  .. 

.0% 

=  Eo\ 

^y  ^  ^f  +  ^2  +  ^3  +  •  • 

■0% 

-Zo^- 

Moreover,  a  well  known  theorem  in  elementary 
algebra  tells  us  that  every  symmetric  function 
may  be  expressed  as  a  function  of  s^,  s^,  s^  .  .  . 

.    .    .   Sjf. 
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From  this  theorem  it  follows  a  fortiori  that 
we  are  able  to  express  the  constants  A  in  the  fre- 
quency curve  as  functions  of  the  power  sums  of 
the  observations.  While  such  a  procedure  is  pos- 
sible, theoretically  at  least,  we  should,  however, 
in  most  cases  find  it  a  very  tedious  and  laborious 
task  in  actual  practice.  It,  therefore,  remains  to 
be  seen  whether  it  is  possible  to  transform  these 
symmetrical  functions  of  the  power  sums  of  the 
observations  into  some  other  symmetric  functions, 
which  are  more  flexible  and  workable  in  practical 
computations  and  which  can  be  expressed  in  terms 
of  the  various  values  of  s. 

5.    THiELE's  It  is  the  great  achievement  of 

jNv^^ANTs  Thiele  to  have  been  the  first 
mathematician  to  realize  this 
possibility  and  make  this  transformation  by  intro- 
ducing into  the  theory  of  frequency  curves  a  pe- 
culiar system  of  symmetrical  functions  which  he 
called  semi  invariants  and  denoted  by  the  symbols 

Starting  with  power  sums,  Si.  Thiele  defines 
these  by  the  following  identity 

..eT^'T^^T^'     -^.o+f -l-l-Vf-V...  (1) 
which  is  identical  in  respect  to  co. 


Semi-Invariants.  13 

Since  Si  =^o^  the  right  hand  side  of  the  equa- 
tion may  also  be  written  as  e'^i'°-fe^2"'  +  e^3«-|-._  = 

Differentiating  (1)  with  respect  to  co  we  have 


+  ...r      Xgco    X3C0 


*3  ..,3 


5n  +  r^co+r7fW'^+r7rCO^+...|  K  +  l^+l2  ^^"^•* 


'o-|i 


4   ..J5 


=  *.  +  |t"+|-2-w,  +  (fco,+ 


Multiplying    out    and    equating    the    various 
coefficients  of  equal  powers  of  co  we  finally  have 

*3  ~  ^1^2  +  2X2^1  +  X^Sq 

*4   —  K  *:3  +  3  X2  S2  +  3  X3  ^1  +  X4  Sq 


where    the    coefficients    follow    the    law    of    the 
binomial  theorem. 

Solving  for  X  we  have 

Xi  =  5i :  Sq 
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The  semi-invariants  X  in  respect  to  an  ar- 
bitrary origin  and  unit  are  as  we  noted  defined 
by  the  relation 

where  01,02,03...  are  the  individual  observa- 
tions. 

Let  us  now  change  to  another  coordinate 
system  with  another  unit  and  origin  defined  by 
the  following  linear  transformations : — 

o'i  =  aOi-\-c        (i  =  1,2,3,.. .). 

The  semi-invariants  in  this  new  system  are 
given  by  the  relation 

X'.  O)         XftCO^         X'oCD^ 

-1 j f j — ? ^-... 

LL  L?-  L§_  o'.o)        o'i.m        o'„co 

(aOi  +  c)co        (ao2  +  c)co 

Since  the  various  values  of  X'  do  not  depend  upon 
the  quantity  00  we  may  without  changing  the 
value  of  the  semi-invariants  replace  co  by  co  :  a 
in  the  above  equations,  which  gives 
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x;  CO    x;©*    x'oco' 

\la^  ai\2^  a«[3  ^ 
0^ 


s^e 


iaoi  4-  c)  -^         (ao2  +  c)  _         (aoj  +  c)  _ 
a  a  a 

=  e  +  e  +  e  +  . . . 


a    f  0,00         Ojco         Ogco  ] 

con  XiCD         XoCD^         XoOo' 

«,,U_  ^  !i.       Li.  ^••• 
=  e    SqB 

Taking  the  logarithms  on  both  sides  of  the  equa- 
tion we  have 

X'^co      XgCo^      X'gCo^  ^ 

ceo      Xico      Xgco^      Xoco^ 

Differentiating  successively  with  respect  to  co  we 

have 

«. 
%. 
X'        X'co      X',co^  c      ^       ^         X,co2 

a\l_       a^        2a^  a       ^      ^  2 

X;    x;co    x>2  \^2 

X;     X;co  .      . 
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Letting  co  =  0  we  therefore  have 

—  = !-  A,  or  A,  =  aA,  +  c 

^  =  Xo  or  V,  =  aHg 
-I  =  X,  or  X'   =  a^X, 


from  which  we  deduce  the  following  relations 

Xj  {ax  +  c)  =  a\  {x)  +  c 

\r{ax-\-  c)  =  a^\r{x)     for  r  >  1, 

which  shows  how  the  semi-invariants  change  by 
introducing  a  new  origin  and  a  new  unit. 

We  shall  for  the  present  leave  the  semi  in- 
variants and  only  ask  the  reader  to  bear  in  mind 
the  above  relations  between  X  and  s,  of  which  we 
shall  later  on  make  use  in  determining  the  con- 
stants in  the  frequency  curve  cp  (a;) . 


6,  THE  FOURIER    Beforc  discussing  the  genera- 
tion   of    the    total    frequency 
curve  it  will,  however,  be  nec- 
essary to  demonstrate  some  auxiliary  mathema- 
tical formulae  from  the  theory  of  definite  integrals 
and  integral  equations  which  will  be  of  use  in  the 
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following  discussion  as  mathematical  tools  with 
which  to  attack  the  collected  statistical  data  or 
the  numerical  observations. 

One  of  these  tools  is  found  in  the  celebrated 
integral  theorem  by  Fourier,  which  was  the  first 
integral  equation  to  be  successfully  treated.  We 
shall  in  the  following  demonstration  adhere  to 
the  elegant  and  simple  solution  by  M.  Char  Her. 
Charlier  in  his  proof  supposes  that  a  function, 
F(co),  is  defined  through  the  following  convergent 
series. 

F(o3)  =  a[/(o)  +  /(a)e       +/(2a)e        +... 

+  /(a)e  +/(— 2a)e  +... 


or 


m  =  CO 

F(co)  =  a^/(am)6"""^'  (2) 


where  i  =  \/ — i. 

We  then  see  by  the  well  known  theorem  of 
Cauchy  that  the  integral 

■4-  X 

/(CO)  =  ^f{x)e'''^'dx  (3) 

IS  finite  and  convergent.  If  we  now  let  ma  =  x 
and  let  a  =  0  as  a  limiting  value,  a  becomes 
equal  to  dx  and  /(am)  =  f(x).  Consequently  we 
may  write 

2 
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lim  F{w)  =  /(o3). 


a  =  0 


Multiplying  (2)  by  e~'""^*c^co  and  integrating 
between  the  limits  —  nja  and  +  Ti/a  we  get  on 
the  left  an  expression  of  the  form 

and  on  the  right  a  sum  of  definite  integrals  of 
which,  however,  all  but  the  term  containing 
f{ra)  as  a  factor  will  vanish.  This  particular  term 
reduces  to 

-|-7i/a 

a  \ / (ra) d(a      or      27if (ra). 

—  n/a 

Hence  we  have 

+  7i/a 


2^3 


—  racot 


/(ra)  =  ^\F{(o)e    '""^'rfco.  (4  a) 

—  :i/a 

By  letting   a    converge  toward  zero  and  by  the 
substitution  ra  =  x  this  equation  reduces  to 


j{x)  =  ^\/(M)e    ^""rfto.  (4b) 
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Charlier  lias  suggested  the  name  conjugated 
Fourier  function  of  f(x)  for  the  expression  F  (co). 
We  then  have,  if  we  introduce  a  new  function 
i!)(co)  defined  by  the  simple  relation: 

|/27ri|)(co)  =  hm  F{m) 

a  =  0 

t(«')=  j7=^/(*)«""'<to.  (6a) 

00 

-foe 

f(x)  =  -L\tt)(co)6-^^"^<Zco.  (5  b) 

1/2  Ji,) 

The  equations  (5a)  and  (5b)  are  known  as 
integral  equations  of  the  first  kind.  The  expres- 
sion e^  ^  (or  e  ^  ^)  is  known  as  the  nucleus  of 
the  equation.  If  in  (5b)  we  know  the  value  of 
i!)(co)  we  are  able  to  determine  fix).  Inversely, 
if  we  know  f{x)  we  may  find  ib(w)   from  (5a). 


^  cu^E^ASTJiE  ^^  ^^^  ^^^  ^^  ^  position  to 
an^Yntegr^.  make  use  of  the  semi-invariants 
EQUATION  '  of  Thiele,  which  hitherto  in 
our  discussion  have  appeared  as  a  rather  discon- 
nected and  alien  member.  On  page  13  we  saw 
that  the  semi-invariants  could  be  expressed  by 
the  relation 
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^»to3 


where  o,  (i  =  1,  2,  3 )  denotes  the  in- 
dividual observations. 

The  definition  of  the  semi-invariants  does  not 
necessitate  that  all  the  o's  must  be  different.  If 
some  of  the  o's  are  exactly  alike  it  is  self-evident 

that  the  term  e  *  must  be  repeated  as  often  as 
o  occurs  among  all  of  the  observations.  If  there- 
fore N(p(Oi)  denotes  the  absolute  frequency  of  Oi 
where  ^  {oi)  is  the  relative  frequency  function, 
then  the  definition  of  the  semi-invariants  may  be 
written  as : — 

For  continuous  variates,  x,  the  above  sums 
are  transformed  into  definite  integrals  of  the  form 

—  a-  — oo 

Let  us  now  substitute  the  quantity  co|/^ — I,  or 
ico,  for  CO  in  the  above  identity.   We  then  have : — 

e  \  (p(x)dx  =  \  (p(x)e'''^dx 
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under  the  supposition  that  this  transformation 
holds  in  the  complex  region  in  which  the  func- 
tion is  defined. 

In  this  equation  the  definite  integrals  are  of 

■     X 

special  importance.    The  factor  \  (p(x)dx  is,    of 

course,  equal  to  unity  according  to  the  simple 
considerations  set  forth  on  page  seven.  The  in- 
tegral on  the  right  hand  side  of  the  equation  is, 
however,  apart  from  the  constant  factor  [/2  7r 
nothing  more  than  the  tj)  function  in  the  conjugate 
Fourier  function  if  we  let   (pix)  =  f{x),  and 

p-  '^  '^  =  |/2:ri!)(co). 

According  to  (5b)  we  may,  therefore  write  f(x) 
or   ^(x)  as 


e        aco 


as  the  most  general  form  of  the  frequency  func- 
tion cp  (ic)  expressed  by  means  of  semi-invariants. 

8.  FIRST  AppRox-     The     exactness     with     which 

soLrfrioN         ^(x)    is   reproduced    depends, 

of  course,  upon  the  number  of 

\'s  we  decide  to  consider    in  the  above  formula. 

As  a  first    approximation  we  may  omit  all    X's 
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above  the  order  2  or  all  terms  in  the  exponent 
with  indices  higher  than  2.  Bearing  in  mind 
that  i^  =  — 1  we  therefore  have  as  a  first  ap- 
proximation 

The  above  definite  integral  was  first  evaluated 
by  Laplace  by  means  of  the  following  elegant 
analysis.  Using  the  well  known  Eulerean  relation 
for  complex  quantities  the  above  integral  may  be 
written  as 

\  e      "    cos  [{\i  —  X)  od]  rfco  + 


\2 

+  I 


^  \  e  sin  [(Xj  —  ;r)coJ  rfco. 


The  imaginary  member  vanishes  because  the 

^  ^^  r  "1 

factor  e  is  an  even  function  and  sin[(Xi— a^jcoj 

an  uneven  function,  the  area  from  —  oo  to  0  will 

therefore  equal  the  area  from  0  to  +  go  ,  but  be 

opposite  in   sign,   which   reduces  the   total   area 

from  —  00  to  +  CO  or  the  integral  in  question  to 

zero. 
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In  regard  to  the  first  term,  similar  conditions 
hold  except  that  cos  [(X^  —  a;)(X)]  is  an  even  func- 
tion and  the  integral  may  hence  be  written  as 


/  =  2  \  e  cos  (rco)  rfco     where  r  =  \  —  x, 

o 

Regarding  the  parameter  r  as  a  variable  and  dif- 
ferentiating I  in  respect  to  this  variable  we  have 


dl      2  f  /      ,      " 

0 

/  sin(rco)rfco. 

From  this  we  have  by  partial  integration : — 

dl      2 
dr~\ 

e           sin(rco)^fe*  — ~\^    "    cos(rco)da) 

J            Xo   J 

O                   "    0 

=  0-^    or 

Af> 

1    dl                  T 

I  dr            X,' 

From  which  we  find 


log  I  =  -gr  +  log  A 
where  log  ^  is  a  constant.    Hence  we  have  :- 


Ae    - 
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In  order  to  determine  A  we  let  r  =  0  and  we 
have 

This  finally  gives  the  expression  for  ^q(x)  in  the 
following  form : 

as  a  preliminary  approximation  for  the  frequency 
curve  cp(a;). 

The  first  mathematical  deduction  of  this  ap- 
proximate expression  for  a  frequency  curve  is 
found  in  the  monumental  work  by  Laplace  on 
Probabilities,  and  the  function  ^(^(x)  entering  in 
the  expression  ^q{x)  dx,  which  gives  the  probab- 
ility that  the  variate  will  fall  between  x  —  ^dx 
and  X  -\-\dx,  is  therefore  known  as  the  Lapla- 
cean  probability  function  or  sometimes  as  the 
Normal  Frequency  Curve  of  Laplace.  The  same 
curve  was,  as  we  have  mentioned  also  previously 
deduced  independently  by  Gauss  in  connection 
with  his  studies  on  the  distribution  of  accidental 
errors  in  precision  measurements. 

Laplace's  probability  function,  cp^,  (a;)  posses- 
ses  some  remarkable  properties   which  it  might 
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be  well  worth  while  to  consider.  Introducing  a 
slightly  different  system  of  notation  by  writing 
\j  =  M  and  \/\o  =  cJ,  (Pq(x)  reduces  to  the  fol- 
lowing form. 

ay  271 

which  is  the  form  introduced  by  Pearson. 

The  frequency  curve,  cp^Cic),  is  here  expressed 
in  reference  to  a  Cartesian  coordinate  system  with 
origin  at  the  zero  point  of  the  natural  number 
system  and  whose  unit  of  measurement  is  also 
equivalent  to  the  natural  number  unit.  It  is, 
however,  not  necessary  to  use  this  system  in  pre- 
ference to  any  other  system.  In  fact,  we  may 
choose  arbitrarily  any  other  origin  and  any  other 
unit  standard  without  altering  the  properties  of 
the  curve.  Suppose,  therefore,  that  we  take  M 
as  the  origin  and  c5  as  the  unit  of  the  system.  The 
frequency  function  then  reduces  to 

1         -x«:2 


Since  the  integral  of  90  (x)  from  —  00  to  +  00 
equals  unity  the  following  equation  must  neces- 
sarily hold. 
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9    DEVELOPMENT    The      Laplacean      Probability 
BY  POLYNOMIALS  ^^^^^      possesses,      howevsr, 

some  other  remarkable  proper- 
ties which  are  of  great  use  in  expanding  a  func- 
tion in  a  series.  Starting  with  cp^  (aj)  we  may  by 
repeated  differentiation  obtain  its  various  der- 
ivaties.  Denoting  such  derivatives  by  cpj  (a;), 
cp2  (ic),  ^s(^)  '  '  '  respectively  we  have  the  fol- 
lowing relations.^) 

-  x" :  2 

To(^)  =  e 

9i(^)  =  —x^oi^) 

(P2(x)  =  {x''  —  l)%(x) 

93(^)  =  —(x^—3or)(pQ(x) 

c^,(x)  =  (.jc^-6x'  +  3)^,(x) 


and  in  general  for  the  nth  derivative : — 

^^(x)  =  (-ir\^^~''^^x^~\ 

n(n  -  1)  (n-2)  (yi—  3)/~^ 

"^  2-4 

/l(/^-l)(A^-2)(y^-3)(y^-4)(yl-5)/"^ 
2-4.6 


9o(^)- 


1  In  the  following  computations  we  have  omitted 
temporarily  the  constant  factor  1:  /^  of  9o(^)  ^^^  ^ts 
derivatives. 
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It  can  be  readily  seen  that  the  derivatives  of 
<Po  (x)  are  represented  throughout  as  products  of 
polynomials  of  x  and  the  function  fpo  (a;)  itself. 
The  various  polynomials 

H,(x)  =  1 

H,{x)  =  x'  —  l 
H^ix)  =  —{x^  —  ^x) 
H^(x)  =  (x^—Qx:'  +  3) 

and  so  forth  are  generally  known  as  Hermite's 
polynomials  from  the  name  of  the  French  mathe- 
matician, Hermite,  who  first  introduced  these 
polynomials  in  mathematical  analysis. 

The  following  relations  can  be  shown  to  exist 
between  the  three  polynomials 

Hn-\-i(x)  —  xHn(x)  +  nHn-i(x)  =  0 


and 


A  numerical  10  decimal  place  tabulation  of  the 
first  six  Hermite  poljmomials  for  values  of  x  up 
to  4  and  progressing  by  intervals  of  0.01  is  given 
by  J0rgensen  in  his  Danish  work  "Frekvens- 
flader  og  Korr elation". 

There  exist  now  some  very  important  relations 
between  the  Hermite  polynomials  and  the  deriva- 
tives of  cpo(a;),  or  between  Hnix)  and  9„(a;). 
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Consider  for  the  moment  the  two  following 
series  of  functions 

To(^)»     9l(^),     T2(^).     ^si^h     94  W,    •     •     • 

H,{x\H,(x),H,{x\H,{x),H,(x\.  .  . 

where  cp„(ic)  =  Hn{x)^Q{x) and  where  lim  ^n{^)  =  0 
for  X  =  ±  GO. 

We  shall  now  prove  that  the  two  series  cp„  (a;) 
and  Hn(x)  form  a  biorthogonal  system  in  the 
interval  —  oo  to  +  oo ,  that  is  to  say  that  they  are 

(1)  real  and  continuous  in  the  whole  plane 

(2)  no  one  of  them  is  identically  zero  in  the 

plane 

(3)  every   pair   of   them   cp„  (a;)    and    Hm{x), 

satisfy  the  relation. 

^  (pn(x)Hm{x)dx  =  0       (n  ^  m). 

—  cc 

We  have  the  self  evident  relation  (letting  x  =  z) 

5  H„i{z)(?n{z)dz  =   5  Hm{z)Hn{z)(?Q{z)dz  = 

—  00  — 00 

-4- CO 

=    5  Hn{z)(Pm{z)dz. 

—  00 

Since  this  relation  holds  for  all  values  of  m  and  n 
it  is  only  necessary  to  prove  the  proposition  for 
n>m.  For  if  it  holds  for  n>m  it  will  according 
to  the  above  relation  also  hold  tor  n<m. 
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By  partial  integration  we  have : — 

5  Hm{z)fp„{z)dz  = 

,  —  00 

-4-00      -foe 
=   Bm(z)^n-l(z)   ]   —   J   H'm(z)(pn-l{z)dz 

—  OB  — 00 

when  Hm(z)  is  the  first  derivative  of  Hm{z). 

The  first  member  on  the  right  reduces  to  0 
since  c()„_i(z)  =  0  for  2  =  ±  oo.  We  have  therefore  :— 

-(-QC  -;-00 

5  Hm{z)^n(z)dz         =   —   5   H'm{z)(pn-l(z)dz 

—  00  — OD 

J-OB  -f  <* 

$   H'm(z)cpn-i(z)dz  =   —   jj  H'^(z)(pn-2{z)dz 

QD  00 

-j-00  -1-30 

5   H'^(z)<pn-2(z)dz  =    —    \  H'^{z)(pnMz)dz. 

—  00  — oc 

Continuing  this  process  we  obtain  finally  an  ex- 
pression of  the  form 

aC  QC 

when  H^^'^^\z)  is  the  m  +  1  derivative  of  H^{z) 
and  n — m — 1  >0.  Since  Hm{z)  is  a  polynomial  in 
the  mth  degree  its  m  +  1  derivative  is  zero  and 
we  have  finally  that 
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-foe 
J  Hm{z)^n{z)dz    =    0 

for  all  values  of  m  and  n  where  ^  m. 

For  ni  =  n  we  proceed  in  exactly  the  same 
manner,  but  stop  at  the  mth  integration.  We 
have,  therefore,  by  replacing  m  by  n  in  the  above 
partial  integrations 

{Hn{z)<fn{z)dz    =    (-l)»f<>(2)<p„_„(z)dz    = 

—  OB  — OO 

=  (-  lf'\H':\z)<f,{z)dz. 

—  oe 

The  nth  derivative  of  Hniz)  is,  however,  nothing 
but  a  constant  and  equal  to  ( — ir|_/2_.  Hence  we 
have  finally 

' \  Hn(^)<?n{^)d^  =  (- D"  (- 1)" l£  fe~''''<fe  = 
=  [w.  |/27r. 

The  above  analysis  thus  proves  that  the  func- 
tions Hm{z)  and  ^ni^)  are  biorthogonal  to  each 
other  for  all  values  of  n  different  from  m  through- 
out the  whole  plane. 

We  can  now  make  use  of  these  relations  be- 
tween the  infinite  set  of  biorthogonal  functions 
Hm(z)  and   cp„  (2;)  in  solving   the  problem  of  ex- 
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panding  an  arbitrary  function    (p(z)    in   a  series 
of  the  form 

9(2)    =    CoCPo(2^)  +  Cl%{^)  +  C2CP2(2)  +    .    .   . 

the  series  to  hold  in  the  interval  from  —  dc  to 
+  00. 

If  we  know  that  fp(z)  can  be  developed  into 
a  series  of  this  form,  which  after  multiplication 
by  any  continuous  function  can  be  integrated 
term  for  term,  then  we  are  are  able  to  give  a 
formal  determination  of  the  coefficients  c. 

This  formal  determination  of  any  one  of  the 
c's,  say  Ci  consists  in  multiplying  the  above 
series  by  Hiiz)  and  integrating  each  term  from 
—  00  to  00 .  All  the  terms  except  the  one  con- 
taining the  product  Hi{z) (pi  \a,mah.  and  we  have 
for  Ci.  +00  foe 

J  cp(z)Hi{z)dz        5  (p(z)Hi{z)(k 

—  00  — 00 

*  4-00  ^_ 

^(Pi{z)Hi{z)dz  |J_|/27T 

— 00 

If  we  define  the  Hermite  functions  as 

H,{z)=l 
H,(z)  =z 
H^(z)  =  z^  —  1 
H^(z)  =  z'~3z 
H,{z)  =  z'  —  e>z^  ■}-  S 
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the  above  formula  takes  on  the  form 


+  00  +00 


jj  (p  (z)  Hi  (z)  dz        5  <p  (z)  Hi  (z)  dz 

Ci  —  1^  -      —  2       ^ 

jj  ^i{z)Hi{z)dz       (—  lf\i_]/27i 

—  oo 

which   we   shall   prefer   to  use  in  the  following 
discussion. 

It  will  be  noted  that  this  purely  formal  cal- 
culation of  the  coefficients  c  is  very  similar  to  the 
determination  of  the  constants  in  a  Fourier  Series, 
where  as  a  matter  of  fact  the  system  of  functions 

COS2;,  cos2;2;,  cosS^;, 

sin^;,  sin2;2!,  sin3;2;, 


is  biorthogonal  in  the  interval  0<2:<1. 

But  the  reader  must  not  forget  that  the  above 
representation  is  only  a  formal  one,  and  we  do 
not  know  if  it  is  valid.  To  prove  its  validity 
we  must  first  show  that  the  series  is  convergent 
and  secondly  that  it  actually  represents  ^{z)  for 
all  values  of  z. 

This  is  by  no  means  a  simple  task  and  it  can- 
not be  done  by  elementary  methods.  A  Kussian 
mathematician,  Vera  Myller-Lebedeff,  has,  how- 
ever, given  an  elegant  solution  by  means  of  some 
well  known  theorems  from  the  Fredholm  integral 
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equations.     She  has   among  other  things  proved 
the  following  criterion  : — 

"Every  function  cp  (z)  which  together  with  its 
first  two  derivatives  is  finite  and  continuous  in  the 
interval  from  —  oo  to  +00  and  which  vanishes 
together  with  its  derivatives  for  z  =  ±  co  can  be 
developed  into  an  infinite  series  of  the  form : — 

where    Hi(z)     is    the    Hermite    polynomial    of 
order  i*\ 


10.  GRAM'S  SERIES  It  is,  howcvcr,  not  our  inten- 
tion to  follow  up  this  treatment 
which  is  outside  the  scope  of  an 
elementary  treatise  like  this  and  shall  in  its  place 
give  an  approximate  representation  of  the  fre- 
quency function,  ^(z),  by  a  method,  which  in 
many  respects  is  similar  to  that  introduced  by 
the  Danish  actuary  Gram  in  his  epochmaking 
work  "Udviklingsrsekker" ,  which  contains  the 
first  known  systematic  development  of  a  skew 
frequency  function.  Gram's  problem  in  a  some- 
what modified  form  may  briefly  be  stated  as 
follows : — Being  given  an  arbitrary  relative  fre- 
quency function,  cp  (z),  continuous  and  finite  in 
the  interval  —  00   to  +  cc    (and  which  vanishes 


34  Frequency  Curves. 

for  z  =  ±  od)  to  determine  the  constant  coeffi- 
cients Cq,  Cj^,  c^,  Cq in  such  a  way  that 

the  series 

Cq To  (g)  ^  gi_y  1  (g)  ^  ^2  92  (g)  _^     _  ,  Cnyn'(g)  _ 

F%(^)    l/s^oC^^)    I/toT^)    '**    1/9^(2) 

gives  the  best  approximation  to  the  quantity 
cp  (z)  :  |/cpo  (z)  in  the  sense  of  the  method  of  least 
squares.  That  is  to  say  we  wish  to  determine  the 
constants  c  in  such  a  manner  that  the  sum  of 
the  squares  of  the  differences  between  the  func- 
tion and  the  approximate  series  becomes  a  mini- 
mum.   This  means  that  the  expression 


y^oi^y 


dz 


must  be  a  minimum. 

On  the  basis  of  this  condition  we  have 

where  the  unknown  coefficients  c  must  be  so  de- 
termined that 
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dz        equals  a  minimum. 


Taking  the  partial  derivatives  in  respect  to  c<  we 
have 

bci  bci  J  y%{z)  hci  J 

—  oc  —  00 

Now  since 

-T-00 

-1-00  """ 


we  get 
bl  ■(.-  9(0) 


+  00  +00 


where  the  latter  integral  equals 


-fee 


—  00 

Equating  to  zero  and  solving  for  Ci  we  finally 
obtain  the  foUowinof  value  for 


4-00 


Ci  =  ^-^  [  ^{z)HMdz        (i  =  1,  2,  3, .  .  .). 

3* 
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This  solution  is  gotten  by  the  introduction  of 
|/(Po  (e)  which  serves  to  make  all  terms  of  the 
form  Ci(pi{z):\/(pQ(z)  =  l/cpo(^)  CiHi{z)  {i  =  1,  2, 
S  .  .  .  n)  orthogonal  to  each  other  in  the  interval 

—  00   to    +00. 

In  all  the  above  expansions  of  a  frequency 
series  we  have  used  the  expression  (p^  (z)  =  e~^l* 
as  the  generating  function  (see  footnote  on  page 
26),  while  as  a  matter  of  fact  the  true  value  of 
cpo  (z)  is  given  by  the  equation  cp^  (z)  =  e~^*'^ :  |/2  tt. 

The  definite  integral  on  page  32 

(-  1/  S  Hi{z)^i{z)dz  =  \±  \e-^'-^dz  =  [£  )/2^ 

—  CO  — 00 

will  therefore  have  to  be  divided  by  l/27r,  and 
the  value  of  the  general  coefficient  Ci  will  hence- 
forth be  reduced  to 

\^(z)H,{z)dz 

where  Hi  {z)  is  the  Hermite  polynomial  of  order 
i  defined  by  the  relation 

i  (i  —  1)  ji  —  2)  {i  —  3)  (i  —  4)  {i  —  5)  z'-^ 

2-4.6  "^•" 
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On  this  basis  we  obtain  the  following  values 
for  the  first  four  coefficients : — 

+  00 

—  oo 

Ci  =  (-l/fT(2)2«fo:[i 

—  00 

+» 

—  00 

C3  =  (—  If  5  2^  —  3z)(p(z)dz  :  |3_ 


C4  =  (—1)^  J  (z'—Qz'-  +  3t)^(z)dz:\^  LH 

While  the  above  development  of  an  arbitrary 
frequency  distribution  has  reference  to  cp  (^) ,  or 
the  relative  frequency  function,  it  is,  however, 
equally  well  adapted  to  the  representation  of  ab- 
solute frequencies  as  expressed  by  the  function, 
Fiz).  If  N  is  the  total  number  of  individual 
observations,  or  in  other  words  the  area  of  the 
frequency  curve,  we  evidently  have 

-j-QO  -("* 

F(z)  =  iVcp(2)  or  J  F{z)dz  =  iV  J  (p{z)dz  =  N. 

00  — 00 

Since  IV  is  a  constant  quantity  we  may,  there- 
fore, write  the  expansion  of  F(z)   as  follows: 
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F{z)  =  N  kcpo(2)  +  CiCp,(2)+c2(p2(2)+  .  .  .]  = 

where  the  coefficients   Ci  have  the  value 
+» 
Ci  =  ^^'  S  F{z)Hi{z)dz  for  i  =  1,  2,  3,  .  .  . 


I 
and  where 


N  =  \  F(z)dz. 


Since  all  the  Hermite  functions  are  polynom- 
ials in  z,  it  can  be  readily  seen  that  the  coeffi- 
cients c  may  be  expressed  as  functions  of  the 
power  sums  or  of  the  previously  mentioned  sym- 
metrical functions  s,  where 

Sr  =  J  sfF(z)dz. 

—  00 

These  particular  integrals  originally  introduced 
by  Thiele  in  the  development  of  the  semi-in- 
variants have  been  called  by  Pearson  the 
"moments"  of  the  frequency  function,  Fiz),  and 
Sr  is  called  the  r**  moment  of  the  variate  z  with 
respect  to  an  arbitrary  origin. 

It  can  be  readily  seen  that  the  moment  of 
order  zero,  or  s^  is 
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So  =  $  z^F(z)(k  =  N  =  N  ^  cp(z)dz. 

—  00  — 00 

Hence  we  have  for  the  first  coefficient  Cq. 
Co  =  $  Fiz)d2:  $  F{z)dz=  1. 

—  00  o^oo 

We  are,  however,  in  a  position  to  further 
simplify  the  expression  for  F(z). 

As  ahready  mentioned  we  are  at  liberty  to 
choose  arbitrarily  both  the  origin  and  the  unit 
of  the  Cartesian  coordinate  system  for  the  fre- 
quency curve  without  changing  the  properties  of 
this  curve.  Now  by  making  a  proper  choice  of 
the  Cartesian  system  of  reference  we  can  make 
the  coefficients  c^  and  c^  vanish.  In  order  to  ob- 
tain this  object  the  origin  of  the  system  must  be 
so  chosen  that 

-(-»  +<» 

Ci  =  ^  ^  zF{z)dz  :  ^  F{z)(k  =  0. 

This  means  that  the  semi  invariant  s^\  s^  =  \ 
must  vanish.  It  can  be  readily  seen  that  the  above 
expression  for  \^  is  nothing  more  than  the  usual 
form  for  the  mean  value  of  a  series  of  variates. 
Moreover,  we  know  that  the  algebraic  sum  (or 
in  the  case  of  continuous  variates,  the  integral) 
of  the  variates  around  the  mean  value  is  always 
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equal  to  zero.  Henoe  by  writing  for  z  the  expres- 
sion {z — M)  when  M  equals  the  mean  value  or 
\  we  can  always  make  c^  vanish. 

To  attain  our  second  object  of  making  c^ 
vanish  we  must  choose  the  unit  of  the  coordinate 
system  in  such  a  way  that  the  expression 

-|-oo  -\-^ 

C.  =  ^^'  5  F{z)H,{z)dz  :  J  F{z)dz  =  0 

00  00 

which  implies  that 

J  F{z)z^dz  —  \  F{z)dz    :  J  F{z)dz  =  0 

—  00  — 00  J      — 00 

or  that  s^:  Sq  —  1  =  0,  or  when  expressed  in  terms 
of  the  semi-invariants  that 


Xg  =  (SoSq  —  sI):s 


But  by  choosing  the  mean  as  the  origin  of  the 
system  the  term  s^^:  s^  is  equal  to  0  and  we  have 
therefore  Ag  =  c5^  =  ^2  •  ^o  =  1-  Hence,  by  selec- 
ting as  the  unit  of  our  coordinate  system  [/Ag  or 
c,  where  a  is  technically  known  as  the  dispersion 
or  standard  deviation  of  the  series  of  variates,  we 
can  make  the  second  coefficient  c^  vanish. 

In  respect  to  the   coefficients  Cg    and  c^   we 
have  now 
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c,  = 


(-V 


+  00 


+  00 


5  z^F{z)dz  —  S  J  zF(z)dz  :  J  F(z)dz 


+» 


which  reduces  to 
(-1)^ 


^'''^'\  while 


IJ. 


5   04i^(2)(^e—  e'j  2^i^(2)rf0  + 


+  3  5  i^(2)rf0   :  J  i^(0)rf2 


+  00 


which  reduces  to 


^4 

6^2  ^  3^0 

:|4  = 

^4 

-3 

^0 

^0             ^0    J 

[^0 

While  the  coefficients  of  higher  order  may  be 
determined  with  equal  ease,  it  will  in  general  be 
found  that  the  majority  of  nioderately  skew  fre- 
quency distributions  can  be  expressed  by  means 
of  the  first  4  parameters  or  coefficients. 


11.  COEFFICIENTS     We  shall  now  show  how  the 

EXPRESSED  AS  ij.rj.T-i  £ 

SEMI-INVARIANTS  Same  results  for  the  values  of 
the    coefficients    may    be    ob- 
tained from  the  definition  of  the  semi-invariants. 
Since  we  have  proven  that  a  frequency  function, 
F{z),  may  be  expressed  by  the  series 
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F{z)  =Zci'?ii^) 

we  may  from  the  definition  of  the  semi-invariants 
write  down  the  following  identity: — 

X,CO         XoCD* 
OD 

where  N  is  the  area  of  the  frequency  curve. 

The  general  term  on  the  right  hand  side  of 
the  equation  will  be  of  the  form 

4-00 
—  oo 

where  the  integral  may  be  evaluated  by  partial 
integration  as  follows  : — 

—  00  — oo  — 00 

and  where  the  first  term  on  the  right  vanishes 
leaving 

—  00  — 00 

Continuing   in   the   same   manner   we  obtain   by 
successive  integrations 
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—  oc  — 00 

(_co)2  5  e''°cpr-2{z)dz  =  (— co)8  5  e'''^rM^)dz 


from  which  we  finally  obtain  the  relation 
J  e''°(Pr{2)d2  =  (—coy  5  e^^^tpoC^jcfe 


200 — 


This  latter  integral  may  be  written  as 

*  +  "*  1/  M 


(0*    ' 

/2S  3 


(— coy^^r      2 


(z-ayy 


e  dz 


1/2:t         '^  '        ^ 

Consequently  the  relation  between  the  semi-in- 
variants and  the  frequency  function  may  be  writ- 
ten as  follows : — 


44 


or 
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=  iV  [co  - 

-  q  CO  +  Co  CO^ Cg  co^  +  . 

Si 

X,co       co2  ^^ 
^+^(X,     1)+... 

36                          — 

=  iV  [co 

—  Ci  CO  +  Co  CO Cg  CO'^  + 

e\ 


By  succjessive  differentiation  with  respect  to  co 
and  by  equating  the  coefficients  of  equal  powers 
of  CO  we  get  in  a  manner  similar  to  that  shown 
on  page  13  the  following  results  : — 


^0  __  fo  __  -j 


Ci  =  —  Xi 

If  we  now  again  choose  the  origin  at  X^,  or 
let  Aj  =  0,  and  ch(x>se  [/Xg  =  1  as  the  unit  of  our 
coordinate  system  we  have  : — 

^0  ^^  1?  ^1  "^  ^?  ^2  ^=  ^»  ^3  ^^    I Q    '^3?  ^4  ^^  1^   ** 
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12.  LINEAR  TRANS-  The  theoretical  development  of 
FORMATION      ^^^   ^^^^   formulae   explicitly 

assumes  that  the  variate,  z,  is 
measured  in  terms  of  the  dispersion  or  /Xg  (^)  ^-nd 
with  Xi(z)  as  the  origin  of  the  coordinate  system. 
In  practice  the  observations  or  statistical  data  are, 
hov7ever,  invariably  expressed  with  reference  to 
an  arbitrarily  chosen  origin  (in  the  majority  of 
cases  the  natural  zero  of  the  number  scale)  and 
expressed  in  terms  of  standard  units,  such  as 
centimeters,  grams,  years,  integral  numbers,  etc. 
Let  us  denote  the  general  variate  in  such  ar- 
bitrarily selected  systems  of  reference  by  x.  Our 
problem  then  consists  in  transforming  the  various 

semi-invariants,  \{x),  ^2(^)1  ^3(^)5  Ki^) 

to  the  z  system  of  reference  with  X^  (z) 

as  its  origin  and  j/Xg  (z)  as  its  unit.  Such  a  trans- 
formation may  always  be  brought  about  by  means 
of  the  linear  substitution 

z  =  ax  +  b 

which  in  a  purely  geometrical  sense  implies  both 
a  change  of  origin  and  unit.  On  page  16  we 
proved  the  following  general  properties  of  the 
semi-invariants 

\(z)  =  Xi{ax-{-b)  =  a\{x)-{-b 
\^(z)  =  \{ax-\-b)  =  a'^Xfix), 
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Let  us  now  write  X^  (x)  =  M  and   Xg  (x)  =   o*, 
we  then  have  the  following  relations : — 
X^(z)  =  aM  +  b 

Since  the  coordinate  system  of  reference  must 
be  chosen  in  such  a  manner  that  \i  (z)  =0  and 
j/Xg  (z)  =1  we  have : — 

aM  +  b  =  0 

aa  =  1 

from  which  we   obtain   a  =  —  and  b  =  , 

which  brings  z  on  the  form  :  z=  (x — M)  :  a  while 
(Po(2;)  becomes 

9o(^) 


1  —  (a;  — ilf2):2  0» 


|/2:rTC5 

Moreover,  we  have  X^  (z)  =  X,  (x)  :  (5^  for  all 
values  of  r  >  2.  We  are  now  able  to  epitomize 
the  computations  of  the  semi-invariants  under  the 
following  simple  rules. 

(1)  Compute  Xi  (x)  in  respect  to  an  arbitrary 
origin.  The  numerical  value  of  this  parameter 
with  opposite  sign  is  the  origin  of  the  fre- 
quency curve. 

(2)  Compute  X,  (x)  for  all  values  of  r  >  2.  The 
numerical  values  of  those  parameters  divided 
with  (I/X2  {x)%  or  (5%  for  r  =  2,  3,  4,  .  .  . 
....  are  the  semi-invariants  of  the  frequency 
curve. 
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13.     CHARLIER'S       The  general  formulae  for  the 

SCHEME  OF  •   •  •       J. 

COMPUTATION  seiTQ-mvanants  were  given  on 
page  13.  In  practical  work 
it  is,  however,  of  importance  to  proceed  along 
systematic  lines  and  to  furnish  an  automatic  check 
for  the  correctness  of  the  computations.  Several 
systems  facilitating  such  work  have  been  proposed 
by  various  vTriters,  but  the  most  simple  and 
elegant  is  probably  the  one  proposed  by  M.  Char- 
lier  and  which  is  shown  in  detail  with  the  neces- 
sary control  checks  on  the  following  page.  Char- 
lier  employs  moments,  while  we  in  the  following 
demonstration  shall  prefer  the  use  of  the  semi- 
invariants. 

If  we  define  the  power  sums  of  the  relative 
frequencies  cp  (x)  by  the  relation 

rrir  =  5  cfF(x)dx  :  J  F{x)dx  (r  =  0,  1,  2,  3,  . . .), 

— 00  — 00 

we  find  that  the  expressions  for  the  semi-invariants 
as  given  on  page  13  may  be  written  as  fol- 
lows : — 

Ao  =  Jn.2  —  Jn^ 

X3  =  mg  —  3momi  +  2mJ 

X4  =  m^  —  Am^m^  —  S/Wg  +  12m2Jnl  —  Qm^ 
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The  advantage  of  the  Charlier  scheme  for  the 
computation  of  the  semi-invariants  lies  in  the  fact 
that  it  furnishes  an  automatic  check  of  the 
final  results.  If  we  expand  the  expression 
(x  +  1)^  F(x)  we  have: — 

x^F(x)  +  Ax^F{x)  +  Qx^F(x)  +  4:xF(x)  -hF{x) 
or 

^(x-{-iyF(x)  =  54-f  4^3 +  652  +  451 +  5o, 

which  serves  as  an  independent  control  check  of 
the  computations.  Moreover,  another  check  is 
furnished  by  the  relation 

nii  —  X4  +  4W1X3  + 6  7^1X2  +  3^2 +'^1- 

In  order  to  illustrate  the  scheme  we  choose  the 
following  age  distribution  of  1130  pensioned  func- 
tionaries in  a  large  American  Public  Utility  cor- 
poration. 


Ages 

No.  of  Pensioners 

Ages 

No.  of  Pensioners 

35-39 

1 

65—69 

286 

40-44 

6 

70—74 

248 

45—49 

17 

75—79 

128 

50—54 

48 

80-84 

38 

55-59 

118 

85—89 

13 

60-64 

224 

over  90 

3 

The  complete  calculations  of  the  coefficients  c 
are  shown  in  the  appended  scheme  by  Charlier. 
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The  above  computations  give  the  numerical 
values  of  the  frequency  function  which  now  may 
may  be  written  as  follows:  / 

F{:x)  =  1130  [(cpoW  +  .0258(p3(a;)4 0158cp4(a;)]        +y 
where  ^  i  /x'+.oi95\» 


^•("^^^  1.624  1/2^' 


1^  /z+.0195y 
2  V    1.6240   / 


^^'  BETWEEN^OB-  "^^^   ^®^*    ^^^P  ^»  ^^"^   *^   ^^^^ 

^^Vheo^tical^^  ^^*    *^^    numerical   values    of 
VALUES  F(cc)   for  various  values  of  x 

and  compare  such  values  with  the  ones  originally 
observed.  This  process  is  shown  in  detail  in  the 
following  scheme . 

Column  (1)  gives  the  values  of  the  variate  x 
reckoned  from  the  provisional  origin,  or  the  centre 
of  the  age  interval  65-69.  (2)  is  x  less  the  first 
semi-invariant,  whereby  the  origin  is  shifted  to 
the  mean  or  X.  Column  (3)  represents  the  final 
linear  transformation  :  z  =  {x  —  \)\  c. 

Columns  (4),  (5)  and  (6)  are  copied  directly 
from  the  standard  tables  of  J0rgensen  or  Charlier. 
Column  (7)  is  (5)  multiplied  by  0.0258  or  the 
product  — [c3T3(2)]:[3_,  while  (8)  is  [c4Cp4(2)]:|_4. 

Column  (9)  is  the  sum  of  (4),  (7)  and  (8). 
If  we  now  distribute  the  area  N  =  5o  or  1130  'pro 

4* 
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rata  according  to  (9) ,  we  finally  reach  the  theore- 
tical frequency  distribution  expressed  in  5-year 
age  intervals  and  shown  in  column  (10)  alongside 
which  we  have  inserted  the  originally  observed 
values.  Evidently  the  fit  is  satisfactory.  It  will 
be  noted  that  the  final  frequency  series  is  expres- 
sed in  units  of  5-year  age  intervals.  This,  how- 
ever, is  only  a  formal  representation.  By  sub- 
dividing the  unit  intervals  of  column  (1)  in  5 
equal  parts,  and  by  computing  all  the  other 
columns  accordingly,  we  get  the  theoretical  fre- 
quency series  expressed  in  single  year  age  inter- 
vals. 

15.  THE  PRINCIPLE    The  following  paragraph  pur- 

OF  METHOD  OF  <       ,  •  i_    •    r  i.- 

LEAST  SQUARES  ports  to  givc  a  brief  exposition 
of  the  determination  of  the  co- 
efficients in  the  Gram  or  Laplacean — Charlier 
series  in  the  sense  of  the  method  of  least  squares 
as  a  strict  problem  of  maxima  and  minima,  wholly 
independent  of  the  connection  between  the  method 
of  least  squares  and  the  error  laws  of  precision 
measurements.  ^ 

The  simple  problem  in  maxima  and  minima 
which  forms  the  fundamental  basis  of  the  method 


^  In  the  following  demonstration  I  am  adhering  to 
the  brief  and  lucid  exposition  of  the  Argentinean  actuary, 
U.  Broggi,  in  his  exellent  Traite  rf'  Assurances  sur  la  Vie. 
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of  least  squares  is  the  following :  Let  m  unknown 
quantities  be  determined  by  observations  in  such 
a  manner  that  they  are  not  observed  directly  but 
enter  into  certain  known  functional  relations, 
fiix^,  x^,  x^,  .  .  .  .  Xm) ,  containing  the  unknown 
independent  variables,  x^^,  x^,  x^,  .  .  .  Xm-  Let 
furthermore  the  number  of  observations  on  such 
functional  relations  be  n  (where  n  is  greater  than 
m).  The  problem  is  then  to  determine  the  most 
plausible  system  of  the  values  of  the  unknowns 
from  the  observed  system. 

/l  ("^l  1    *^2  ?    '^31    '   •   '    *^»»)    ^^    ^1 
72  (*^.l  1    '^2-)    -^31    •    •   •    '^rn)    =^    ^2 


/n< 

1^1, 

X^ 

,    X. 

when  /i, 

/., 

. 

in 

relations  and 

^1. 

02> 

1    .    . 

3?     • 


•    Xm)    —    On 

are  the  known  functional 
.  On  their  observed  values. 
Such  equations  are  known  as  observation  equa- 
tions. 

In  order  to  further  simplify  our  problem  we 
shall  also  assume  that 

1  All  the   equations  of  the   system  have  the 
same  weight,  and 

2  All  the  equations  are  reduced  to  linear  form. 
By  these  assumptions  the  problem  is  reduced 

to  find  m  unknowns  from  n  linear  equations. 


I 
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^1  ^1   +   ^1  ^2  +     •     •     •     =  ^1 


OnXi  +  bnX2   +     •     •     .     =  On 

Since  n  is  greater  than  m  we  find  the  problem 
over-determined,  and  we  therefore  seek  to  deter- 
mine the  unknown  quantites,  x^^,  x^,  •  .  .  x,n  in 
such  a  way  that  the  sum  of  the  squares  of  the 
differences  between  the  functional  relations  and 
the  observed  values,  0  becomes  a  minimum.  This 
implies  that  the  expression 

^(aiX^  +  biX2-\-  .  .  .  —OiY  =  tj)(a;i,  x^,  ^  .  .  Xm) 

i  =  l 

must  be  a  minmium  or  the  simultaneous  existence 
of  the  equations. 

^  =  0  ^  =  0  . . .  ^  =  0.     (/) 

bXi  '    bX2  '  bXm 

If  we   now  introduce  the  following  notation 

airci+  biX2+  ...  — Oi  =  Xi  ior  i  =  1,  2,  3,  .  .  .  to, 

the  m  equations  in  the  above  system  (I)  evidently 
take  on  the  following  form 
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Xl^l  +  X2^2+    •    •    •    +Kbn  =  0 


If  we  now  again  re-substitute  the  expressions 
for  X  in  terms  of  the  linear  relations 

aiX-^-[-biX2+  .  .  .  Oi  =  Xi,  for  i  =  1,  2,  3,  .  .  .  n, 

and  collect  the  coefficients  oi  x^,  x^,  .  .  .  Xn,  these 
equations  may  be  expressed  in  the  following  sym- 
bolical form  : 

[aa]a?j  +  [ah^x^  +   ....  —  [ao']  =  0 
[ah'jx^  +  [bb>2  +....—  [bo]  =  0 


[afcja?!  4-  [b/c]a?2  +....+  [kk']Xm  —  [feo]=0 

where   [_aa]  =    a^^   +    a.^^   -{-...  . 
[ah']  =  ttj  bj  +  ttg  bg  +    .   .  .  . 

is  the  Gaussian  notation  for  the  homogeneous  sum 
products. 

The  above  equations  are  known  as  normal 
equations,  and  it  is  readily  seen  that  there  is  one 
normal  equation  corresponding  to  each  unknown. 
Our  problem  is  therefore  reduced  to  the  solution 
of  a  system  of  simultaneous  linear  equations  of  m 
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unknowns.  If  m  is  a  small  number,  or,  what 
amounts  to  the  same  thing,  there  are  only  two  or 
three  unknowns  the  solution  can  be  carried  on 
by  simple  algebraic  methods  or  determinants.  If 
the  number  of  unknowns  is  large  these  methods 
become  very  laborious  and  impractical.  It  is  one 
of  the  achievements  of  the  great  German  mathe- 
matician, Gauss,  to  have  given  us  a  method  of 
solution  which  reduces  this  labor  to  a  minimum 
and  which  proceeds  along  well  defined  systematic 
and  practical  lines.  The  method  is  known  as  the 
Gaussian  algorithmus  of  successive  elimination. 


16.  GAUSS'  soLU-      For  the  sake  of  simplicity  we 
^^%UATioNs''''  shall  limit  ourselves  to  a  sy- 
stem  of  four  normal  equations 
of  the  form 

[«a].Ti  +  [ah^x^  +  [fltcJ.Tg  +  [ad'jx^ —  [ao']  =  0 

[ah']Xj^  +  [bbjajg  +  [hcjx^  +  ihd]x^ —  [ho]  =  0 

[ac]rri  +  [^^J^^a  +  [^^1^3  +  [cc^]^:^ —  [co]  =  0 

[ad'jx^  +  [hd]x^  +  [cdjx^  +  [dd]x^ — [do]  =  0 

The  generalization  to  an  arbitrary  number  of 
unknowns  offers  no  difficulties,  however. 

On  account  of  their  symmetrical  form  the 
above  equations  may  also  be  written  in  the  more 
convenient  form,  viz.  : 
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[bb]aj2  +  [hc']x^  +  [hd'jx^ —  [bo']  =  0 

[ccjajg  +  [cd~\x^ —  [co]  =  0 

[ddjx,—  [do]  =  0 

From  the  first  equation  we  find 


_  [ao]       [ab]  [ac]  [ad] 

^^  "  [aa]       [aa]''^      [aa]""^      [aa]''^' 

Substituting  this  value  in  the  foUov^ing  equa- 
tions and  by  the  introduction  of  the  nev^  symbol 

M— I^M  =  [ik.l] 
^  [aa\  L     J         L       J 

v^e  now  obtain  a  new  system  of  equations  of  a 
lower  order  and  of  the  form 

[bb.l]aj2  +  [bc.l]x,  +  [bd.l]x^'-[bo.l]  =  0 

[cc.ljajg  +  [cd.l]x^ —  [co.l]  =  0 

[dd.l]x^—[do.l]  =  0 

Solving  for  x^  we  have 

_  [boA]  _  [bc.l]      _  [bd.l] 
^2  ~  [bb.l]       [bb.l]  ^^      [bb.l]  ^^' 

Substituting   in   the   following   equations   and 
writing 

[ik.n-^~^^[bk.l]  =  [ik.2] 
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we  have 

[cc.2]xs  +  [cd.2]x^  =  [co.2] 
[dd.2]x^  ==  [do.2] 
or 

[co.2]       [cd.2] 

^  ~  [cc.2]       [cc.2]    *• 
Moreover,  by  v^Titing 

[iA.2]  =  [«.2][J|=[ift.3], 

we  have  finally 

[dd.3]x^  =  [do.3] 

This  gives  us  the  final  reduced  normal  equa- 
tion of  the  lowest  order.  By  successive  substitu- 
tion we  therefore  have: 

[do.3] 
^^  ~  [dd.3] 

[co.2]       [cd.2] 
^  ~  [cc.2]       [cc.2]  ^* 

-  [^Q-1]       [bc.l]       [bd.l] 
[bb.l]      [bb.l]      [bb.l] 

^       [aa]       [ad\    ^      [aa]    ^      [aa]    * 
as  the  ultimate  solution  of  the  unknowns* 


X2  — 
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17.  ARITHMETICAL  The  example  in  paragraph  13 
"^^^METHCw  ^^  gave  an  illustration  of  the  ap- 
plication of  the  method  of  mo- 
ments. As  previously  stated  this  method  works 
quit©  w^ell  in  cases  of  moderate  skewness,  but  is 
less  successful  in  extremely  skew^  curves  and  where 
the  excess  is  large.  We  shall  now  give  an  illustra- 
tion of  the  calculation  of  the  parameters  by  the 
method  of  least  squares.  The  example  we  chooee 
is  the  well-known  statistical  series  by  the  disting- 
uished Dutch  botanist,  de  Vries,  on  the  number 
of  petal  flowers  in  Ranunculus  Bulhosus.  This 
is  also  one  of  the  classical  examples  of  Karl  Pearson 
in  his  celebrated  original  memoirs  on  skew  varia- 
tion. Although  the  observations  of  de  Vries  lend 
themselves  more  rea-dily  to  the  method  of  logarith- 
mic transformation,  which  we  shall  discuss  in  a 
following  chapter,  we  have  deliberately  chosen  to 
use  it  here  for  two  specific  reasons.  Firstly  it  is 
a  most  striking  illustration,  in  refutation  of  the 
immature  criticism  of  the  Gram-Charlier  series 
by  a  certain  young  and  very  incautious  American 
actuary,  Mr.  M.  Davis,  who  has  gone  on  record 
with  the  positive  statement,  "that  the  Charlier 
series  fails  completely  in  case  of  appreciable  skew- 
ness". Secondly  (and  this  is  the  more  important 
reason)  it  offers  an  excellent  drill  for  the  student 
in  the  practical  applications  of  the  method  of  least 
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squares  because  it  gives  in  a  very  brief  compass 
all  the  essential  arithmetical  details.  The  observa- 
tions of  de  Vries  are  as  follows : 


No.  of  petals 

X 

Fix)  =  0^ 

5 

0 

133 

6 

1 

55 

i 

2 

23 

8 

3 

7 

9 

4 

2 

10 

5 

2 

where  F{x)  denot-es  the  absolute  frequencies.  The 
observed  frequency  distribution  is  well  nigh  as 
skew  as  it  can  be  and  represents  in  fact  a  one- 
sided curve,  and.  should  therefore — if  the  state- 
ment by  Mr.  Davis  is  correct — show  an  absolute 
defiance  to  a  graduation  by  the  Gram-Charlier 
series. 

The  process  we  shall  use  in  the  attempted 
mathematical  representation  of  the  above  series  is 
a  combination  of  the  method  of  semi-invariants 
and  the  method  of  least  squares.  Following 
Thiele's  advice  we  determine  the  first  two  semi- 
invariants  in  the  generating  function  directly  from 
the  observations  while  the  coefficients  of  this 
function  and  its  derivations  are  determined  by 
the  least  square  method. 

Choosing  the  provisional  origin  at  5,  we  obtain 
the  following  values  for  the  crude  moments. 
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5o  =  222,  s,  =  140,  s^  =  292,  ^3  =  806,  s^  =  2,752, 
s,  =  10,790,  s,  =  46,072,  5,  =  207,226, 

from  which  we  find  that 

\^  =  l^\^  =  0.631,  X2  =  0.917,  X3  =  1.644, 

^4  =  3.377,  X5  =  5.972,  \^  =  —2.911, 

X7  =  122.638. 

All  these  semi-invariants  with  the  exception 
of  the  two  first  are,  however,  so  greatly  influenced 
by  random  sampling  in  the  small  observation 
series  that  it  is  hopeless  to  use  them  in  the  deter- 
mination of  the  constants  in  the  Gram-Charlier 
series.  In  fact  an  actual  calculation  does  not  give 
a  very  good  result  beyond  that  of  a  first  rough 
approximation.  The  generating  function,  on  the 
other  hand,  may  be  expressed  by  the  aid  of  the 
two  first  semi-invariants  as  follows : 

1        — z2:2 

where  z  is  given  by  the  linear  transformation : 
z  =  (a;—  0.631) :  0.9576.  (j/Xg  =  0.9576). 

We  now  propose  to  express  the  observed  func- 
tion Fix)  or  (p(z)  by  a  Gram-Charlier  series  of 
the  form : 
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F(x)  =  9(2)  =  A:oCPo(2)  +  A:3cp3(z)  +  /c4cp4(2). 

In  this  equation  we  know  the  values  of  the 
generating  function  and  its  derivatives  for  various 
values  of  the  variate  z  as  found  in  the  tables  of 
J0rgensen  and  Charlier,  while  the  quantities  k  are 
unknowns.  On  the  other  hand  we  know  6  specific 
values  of  F(x)  as  directly  observed  in  de  Vries's 
observation  series.  We  are  thus  dealing  with  a 
system  of  typical  linear  observation  equations  of 
the  forms  described  in  para^aphs  15  and  16 
and  which  lend  themselves  so  admirably  to  the 
treatment  by  the  method  of  least  squares. 

From  the  above  linear  relation  between  x  and 
z  we  can  directly  compute  the  following  table  for 
the  transformed  variate  z. 


X 

z 

0 

-0.688 

1 

+  0.402 

2 

+  1.493 

3 

+  2.583 

4 

+  3.674 

5 

+  4.764. 

The  numerical  values  of  ^q{z)  and  its  derivat- 
ives as  corresponding  to  the  above  values  of  z  can 
be. taken  directly  from  the  standard  tables  of  J0r- 
gensen  and  Charlier.  We  may  therefore  write 
down  the  following  observation  equations : 
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9o 

93 

?4 

0 

.3148fc„ 

—.5472^3 

+  .1207^4 

—133  =  0 

.3679/c„ 

+  .4198^3 

+  .7566fc. 

—  55  =  0 

.1308/c„ 

+  .1506^3 

— .7073fc, 

—  23  =  0 

.0145fc„ 

—.1346^3 

+  .1062^4 

—    7  =  0 

.0(mk, 

—.0180^3 

+  .0486fe. 

—    2  =  0 

.0001fe„ 

—.0005^3 

+  .0020fe. 

—    2  =  0 

for  which  we  now  propose  to  determine  the  un- 
known values  of  k  by  the  least  square  method. 

While  this  method  may  of  course  be  applied 
directly  to  the  above  data,  it  will  generally  be 
found  of  advantage  to  start  with  some  approximate 
values  of  the  /c's.  It  is  found  in  practice  that 
this  approximate  step  saves  considerable  labour 
in  the  formation  and  ultimate  solution  of  the 
normal  equations. 

Although  the  first  approximation  in  the  case 
of  numerous  unknowns  must  be  in  the  nature  of 
a  more  or  less  shrewd  guess,  which  facility  can 
only  be  attained  by  constant  practice  in  routine 
mathematical  computing,  we  are,  however,  in  this 
specific  instance  able  to  tell  something  about  the 
nature  o  fthe  coefficients  from  purely  a  priori  con- 
siderations. We  know  for  instance  from  the  form 
of  the  Gram-Charlier  series  that  the  coefficient  k^ 
of  the  generating  function  must  be  nearly  equal 
to  the  area  of  the  curve,  which  in  this  particular 
instance  is  222.  Moreover,  a  mere  glance  at  the 
observed  series  tells  us  that  it  has  a  decidedly 
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large  skewness  in  negative  direction  from  the 
mean  coupled  with  a  tendency  of  being  "top 
heavy",  indicating  positive  excess.  We  can  there- 
fore assume  as  a  first  approximation  that  the 
coefficients  of  the  derivatives  of  uneven  order  are 
negative  and  the  coefficients  of  derivatives  of  even 
order  are  positive. 

From  such  purely  common  sense  a  priori  con- 
siderations we  therefore  guess  the  following  first 
approximations,  viz.  : 

kl  =  222,  kl  =   —  25,    kl  =  30. 

The  probable  values  of  the  various  fc's  may  be 
written  as 

ki  =  rikl  for  i  =  0,  3,  4, 
and  our  problem  is  therefore  to  find  the  correction 
factor  r    with  which  the  approximate  value    hi 
must  be  multiplied  so  as  to  give  ki. 

Appljdng  the  various  values  of  k^  to  the 
original  observation  equations  on  page  64  we  obtain 
the  following  schedule  for  the  numerical  factors 

of  n. 


a 

h 

c 

0 

s 

69.9 

+  13.7 

+  3.6 

-133.0 

—45.8 

81.7 

—10.5 

22.7 

—  55.0 

+  38.9 

29.1 

-3.8 

—21.2 

-23.0 

—18.9 

3.3 

+  3.4 

+  3.2 

—    7.0 

+  2.9 

0.1 

+  0.5 

+  1.5 

-    2.0 

+  0.1 

0.0 

+  0.0 

+  0.0 

—    2.0 

—  2.0 

184.1 

+  3.3 

+  9.8 

—222.0 

—24.8 
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where  the  additional  control  column  s  serves  as  a 
check. 

The  subsequent  formation  of  the  various  sum- 
products  and  normal  equations  is  shown  in  the 
following  schedules  together  with  the  s  columns 
as  a  check. 


aa 

ab 

ac 

ao 

as 

+   4,886 

+  958 

^    252 

—  9,297 

—3201 

+  6,675 

-858 

+  1,855 

—  4,494 

+  3178 

+   847 

—111 

—  617 

—  669 

—  550 

+   11 

+  11 

+   11 

-   23 

+  10 

+    0 

+  0 

+   0 

—    0 

+   0 

+    0 

+  0 

+   0 

—   0 

+   0 

+r2;4i9 

+  0 

+1,501 

—14,483 

-563 

bb 

be 

bo 

bs 

+  188 

+   49 

—  1,822 

-628 

+  110 

—  238 

—  578 

—  408 

+  14 

+   81 

+   87 

+  72 

+  12 

+   11 

—   24 

+  10 

4   0 

+   0 

—    1 

+   0 

+  0 

+   0 

—    0 

+   0 

+  324 

-  96 

—  1,182 

—  954 

cc 

CO 

cs 

+   13 

—     479 

—  165 

+  515 

—  1,249 

+  883 

+  449 

+   488 

+  401 

+   10 

—   22 

+   9 

+   2 

—    3 

+   1 

+   0 

+    0 

+   0 

1,265      +1129 
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We  may  now  write  the  normal  equations  in 
schedule  form  as  follows  : 

ORIGINAL  NORMAL  EQUATIONS 
(a)       +12,419       +         0       +    1501      —    14483 

(1)  +0+0—0 
(6)                         +      324      —       96      —     1182 

(2)  +      181      —     1750 
(c)                                           +      989      —     1265 

73)  +.00000^^712^6      —1.16617 

The  sum-products  from  the  observation  equa- 
tions are  shown  in  the  rows  marked  (a),  (h) ,  (c). 
The  row  marked  (3)  and  printed  in  italics  is 
formed  by  dividing  each  of  the  figures  in  row  (a) 
with  12,419.  The  row  marked  (1)  contains  the 
products  of  the  figures  in  row  (a)  multiplied  with 
the  factor  .00000.  All  these  products  happen  in 
this  case  to  be  equal  to  zero.  Kow  (2)  is  the 
products  of  the  factor  0.12086  and  the  figures  in 
row  (a). 

We  next  subtract  row  (1)  from  row  (b) ,  row 
(2)  from  row  (c) ,  which  results  in  the  following 
schedule,  which  is  known  as  the  first  reduction 
equation. 

FIRST  REDUCTION  EQUATIONS 
(a)       +324      —       96      —     1182 
(1)  +28+350 

(6)  +808+485 

W)  —.29626"— 3.64814 
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The  above  equations  are  treated  in  a  similar 
manner  as  the  original  normal  equations,  and  we 
have  therefore  the  2nd  reduction  equation  of  the 
form : 

SECOND  REDUCTION  EQUATION 
+  780  +135 

The  solution  for  the  unknown  r's  may  now 
be  shown  as  follows  : 

r,  =  —135:780  =  — .17308 
r,  =  3.64814—  (—.29626)   (—.17308)   =  3.59637 
r,  =  1.16617—  (0.0)  3.59637)  —  (.12086) 
(—.17308)  =  1.18709. 

From  which  we  find  : — 

k,  =  263.5,        ^3  =  —89.9,        k^  =  —  5.1 

Applying  these  factors  to  the  values  of  (?o(z) , 
%(z)  and  ^^(z)  we  obtain  the  following  re- 
sult :— ' 

A*ocpo         h^s  h^A        ^f^i^i       Obs. 

82.9        +49.2       —0.6       131.5       133 


96.9 

-37.7 

-3.9 

55.3 

55 

34.5 

—13.5 

+3.6 

24.6 

23 

3.8 

+  12.1 

-^.5 

15.4 

7 

0.1 

+  1.0 

-0.2 

0.9 

2 

0.0 

+  0.0 

—0.0 

0.0 

2 

^  For  a  closer  approximation  see  my  Mathematical 
Theory  of  Probabilities  (Second  Edition,  New  York,  1921). 
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18.  TRANSFORMA-  While  it  is  always  possible  to 
^^VARIATE^  express  all  frequency  curves  by 
an  expansion  in  Hermite  poly- 
nomials, the  numerical  labor  when  carried  on  by 
the  method  of  least  squares  often  involves  a  large 
amount  of  arithmetical  work  if  we  wish  to  retain 
more  than  four  or  five  terms  of  the  series.  Other 
methods  lessening  the  arithmetical  work  and  ma- 
king the  actual  calculations  comparatively  simple 
have  been  offered  by  several  authors  and  notably 
by  Thiele,  who  in  his  works  discusses  several 
such  methods.  Among  those  we  may  mention  the 
method  of  the  so-called  free  functions  and  ortho- 
gonal substitution,  the  method  of  correlates  and 
the  adjustment  by  elements.  The  chapters  on 
these  methods  in  Thiele 's  work  are  among  some 
of  the  most  important,  but  also  some  of  the 
most  difficult  in  the  whole  theory  of  observations 
and  have  not  always  been  understood  and  appre- 
ciated by  the  mathematicians,  chiefly  on  account 
of  Thiele 's  peculiar  style  of  writing.  A  close  study 
of  the  Danish  scholar's  investigations  is,  how- 
ever, well  worth  while,  and  Thiele 's  work  along 
these  lines  may  still  in  the  future  become  as 
epochmaking  in  the  theory  of  probability  as  some 
of  the  researches  of  the  great  Laplace.  The 
theory  of  infinite  determinants  as  used  by  M. 
Fredholm  in  the  solution  of  integral  equations  is 
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another  powerful  tool  which  offers  great  advant- 
ages in  the  way  of  rapid  calculation.  All  these 
methods  require,  however,  that  the  student  must 
be  thoroughly  familiar  with  the  difficult  theory 
upon  which  such  methods  rest,  and  they  have 
for  this  reason  been  omitted  in  an  elementary 
work  such  as  the  present  treatise. 

We  wish,  however,  to  mention  another  method 
which  in  the  majority  of  cases  w411  make  it  pos- 
sible to  employ  the  Gram  or  Laplacean — Charlier 
curves  in  cases  with  extreme  skew^ness  or  excess. 
We  have  here  reference  to  the  method  of  logarith- 
mic transformation  of  the  variate,  x. 


19.  THE  GENERAL  One  of  the  simplest  trans- 
TRju^SFO^ATioN  formations  is  the  previously 
mentioned  linear  transforma- 
tion of  the  form  z  =  f{x)  =  ax  +  h,  by  which 
we  can  make  two  constants,  c^  and  c^  vanish. 
Other  transformations  suggest  themselves,  how- 
ever, such  as  fix)  =  ax^  +  hx  ■\-  c ,  fix)  =  j/a;, 
fix)  =  logx  and  so  forth.  For  this  reason  I  pro- 
pose to  give  a  brief  development  of  the  general 
method  of  transformations  of  the  statistical 
variates,  mainly  following  the  methods  of  Charlier 
and  Jorgensen. 

Stated  in  its  most  general  form  our  problem 
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is:  If  a  frequency  curve  of  a  certain  variate  is 
given  by  F(x)  what  will  be  the  frequency  curve 
of  a  certain  function  of  x,  say  fix)? 

The  equation  of  the  frequency  curve  is  y  = 
Fix) ,  which  means  that  Fix)dx  is  the  probability 
that  X  falls  in  the  interval  between  x — ^dx  and 
x  +  Jcf.T.  The  probability  that  a  new  variate  z 
after  the  transformation  z  =  fix),  or  x(^)  =  ^i 
falls  in  the  interval  z  —  ^dz  and  z  +  \dz  is  there- 
fore simply 

F[x(z)\tiz)dz  =  Fix)dx, 

which  gives  in  symbolic  form  the  equation  of  the 
transformed  frequency  curve. 

The  frequency  for  z  =  Hx)  is  of  course  the 
same  as  for  x.  The  ordinates  of  the  frequency 
curve,  or  rather  the  areas  between  corresponding 
ordinates,  are  therefore  not  changed,  but  the  ab- 
cissa  axis  is  repla<jed  by  fix).  Equidistant  inter- 
vals of  x  will  therefore  not  as  a  rule — except  in 
the  linear  transformation — correspond  to  equid- 
istant intervals  of  fix). 

If,  for  instance,  the  frequency  curve  Fix)  is 
the  Laplacean  normal  curve 

1  — z»:2(3» 

a|/27r 
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and  if  we  let  z  =  f{x)  =  x^  ov  x  =  \/z,  we  have 
evidentlv  „  , 

F(z)  =  —-=    '^---. 
C5l/27r       2]/z 


20  LOGARITHMIC      Of  the  various  transformations 
TRANSFORMATION  ^j^^   logarithmic   is   of   special 

importance.  It  happens  that 
even  if  the  variate  x  forms  an  extremely  skew 
frequency  distribution  its  logarithms  will  be 
nearly  normally  distributed. 

This  fact  was  already  noted  by  the  eminent 
German  psychologist,  Fechner,  and  also  men- 
tioned by  Bruhns  in  his  KolleMivmasslehre.  But 
neither  Fechner  nor  Bruhns  have  given  a  satis- 
factory theoretical  explanation  of  the  transforma- 
tion and  have  limited  themselves  to  use  it  as  a 
practical  rule  of  thumb. 

Thiele  discusses  the  method  under  his  adjust- 
ment by  elements,  but  in  a  rather  brief  manner. 
The  first  satisfactory  theory  of  logarithmic  trans- 
formation seems  to  have  been  given  first  by  J0r- 
gensen   and  later  on   by   Wicksell.^)     J0rgensen 


^  The  law  of  errors,  leading  to  the  geometric  mean 
as  the  most  probable  value  of  the  variate  as  discovered 
by  Prof.  Dr.  Th.  N.  Thiele  in  1867  may,  however,  be  con- 
sidered as  a  forerunner  of  Jorgensen's  work. 
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first  begins  with  the  transformation  of  the  normal 
Laplacean  frequency  curve.  Letting  z  =  loqx  and 
bearing  in  mind  that  the  frequency  of  x  equals 
that  of  loqx  we  have 

z  =  f{x)  =  log  X,  OT  X  =  x(2)  =  e^  and  dx  =  ^dz. 

The  continuous  power  sums  or  moments  of 
the  rth  order  around  the  lower  limit  take  on 
the  form 

»  J^  /log  X — m\* 

{n]/2^)-^Ny^xfe"^^     "      ^  dx  =^ 
0 


on  the  assumption  the  logx  is  normally  distrib- 
uted. 

The  change  in  the  lower  limit  in  the  second 
integral  from  —  qo  to  zero  arises  simply  from  the 
fact  that  the  logarithm  of  zero  equals  minus  in- 
finity and  the  point  — 00  is  thus  by  the  trans- 
formation moved  up  to  zero. 

By  a  straightforward  transformation  we  may 
write  the  above  integral  as 
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_      iV       w(r  +  l)  +  V«nM»-+l)»  C     —y*t^dt  = 
Mr  — 


1/271 


,^  »n(r  +  l)  +  V2n»(»*+l)* 

iVe 


Changing  from  moments  to  semi-variants  by 
means  of  the  well-known  relations 

\  =  M, 

Ag  =  {MzMQ  —  Mf):Ml 

\  =  (M^m—  ^M^M^Mi  —  SMlMl-^- 


we  have 


\ 

= 

^^m-fV«n' 

h 

= 

m  +  1.5n* 

X, 

= 

e^"*  +  '"'(e"'- 

1) 

^3 

= 

/"•  +  ""•(/»' 

— 

3e"' 

+  2) 

.\,  =  e*"'+«»"(e«"-_4e'"'_3e'''+12e"  — 6). 
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These  equations  giwe  the  semi-invariants  ex- 
pressed in  terms  of  m  and  n.  On  the  other  hand 
if  we  know  the  semi-invariants  from  statistical 
data  or  are  able  to  determine  these  semi-invariants 
by  a  priori  reasoning  we  may  find  the  parameters 
ni  and  n. 

21,  THE  MATHEMA'  A  point  which  we  must  bear 
TicAL  ZERO  •  j^  mind  is  that  the  above  semi- 
invariants  on  account  of  the 
transformation  are  calculated  around  a  zero  point 
which  corresponds  to  a  fixed  lower  limit  of  the 
observations. 

Very  often  the  observations  themselves  in- 
dicate such  a  lower  limit  beyond  which  the  fre- 
quencies of  the  variate  vanish.  In  the  case  of 
persons  engaged  in  factory  work  there  is  in  most 
countries  a  well-defined  legal  age  limit  below 
which  it  is  illegal  to  employ  persons  for  work. 
Another  example  is  offered  in  the  number  of 
alpha  particles  radiated  from  certain  radioactive 
metals.  Since  the  number  of  particles  radiated 
in  a  certain  interval  of  time  must  either  be  zero 
or  a  whole  positive  number  it  is  evident  that — 1 
must  be  the  lower  limit  because  we  can  have  no 
negative  radiations.  Analogous  limits  exist  in  the 
age  limit  for  divorces  and  in  the  amount  of 
moneys  assessed  in  the  way  of  income  tax. 
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The  lower  limit  allows,  however,  of  a  more 
exact  mathematical  determination  by  means  of 
the  following  simple  considerations.  It  is  evident 
that  this  lower  limit  must  fall  below  the  mean 
value  of  the  frequency  curve.  Let  us  suppose  that 
it  is  located  at  a  point,  a,  located  say  n  units  in 
negative  direction  from  the  mean,  M  =\,  and 
let  us  to  begin  with  select  \  as  the  origin  of  the 
coordinate  system  in  which  case  the  first  semi- 
invariant,  Xj,  is  equal  to  zero.  Transferring  the 
origin  to  a  the  first  semi-invariant  equals  t\  ,  while 
the  semi-invariants  of  higher  order  remain  the 
same  as  before  the  transformation  and  we  have : 

.  »n-fl.5n« 

Ag  =  ^He'^—l)  or  e"'  =  l+Xg'-n' 
which  reduces  to  X^rf  —  3\lrf 


r}' 


\l      3X1 
rf      T)*J 


The  solution  of  this  cubic  equation  which  has 
one  real  and  two  imaginary  roots  gives  us  the 
value  of  r\  or  X^  —  a  and  thus  determines  the 
mathematical  zero  or  lower  limit.  We  have  in 
fact:  ^2  _  loga  +  Xgin^)  and 

m  =  logn  —  1.5A^'^,  while 
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22.  LOGARITHMIC'     We  have  already  shown  that 

ALLY  TRANS-  ,  ,.        ,    / 

FORMED  FRE-      the  generauzed  frequency  curve 

QUENCY  SERIES  ,°    ,  .   ^  ^ 

could  be  written  as 
where  the  Laplacean  probability  function 

-(x-MT 

is  the  generating  function  with  M  and  a  as  its 
parameters. 

The  suggestion  now  immediately  arises  to  use 
an  analogous  series  in  the  case  of  the  logarithmic 
transformation.  In  this  case  the  frequency  curve, 
F{x),  with  a  lower  limit  would  be  expressed  as 
follows : 

F(.)  =  A,*„(.)  -  ^+^  -  h%^^.. . 

while  the  generating  function  now  is 

where  m  and  n  are  the  parameters. 

1  nl  =  [n. 
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Using  the  usual  definition  of  semi-invariants 
we  then  have 


\a}     Xcog     X3CO3 

1!  "^  2!   "^  3 


s^e^'  =  5o  +  ^-r  +  -7n-  +  ^^  + 


The  general  term  on  the  right  hand  side  in- 
tegral   is  of  the  form 

i—iyks:s\]e'^^s{x)dx 

0 

where  the   integral   may   be  evaluted  by   partial 
integration  as  follows : 

0  0  0 

Since  both  <^(x)  and  all  its  derivatives  are 
supposed  to  vanish  f or  ic  =  0  and  x  =  cc  the  first 
term  to  the  right  becomes  zero  and 

]  e'^^six)  dx=  —(o]  e'^'^^s-i  (x)  dx. 

O  0 

By  successive  integrations  we  then  obtain  the 
following  recursion  formula 
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o  o 

(—03(2  ^e*«>0,_2(a;)da;  =  (— co)3  J  e'^^,-s(x)dx 


(—  CO)'-'  ]  e^°'Oi(a;)rfa;  =  (-co/ J  e'^'^^ixjdx. 

0  0 

Or  finally 

le'^^s(x)dx  =  (—(oy]e'^%{x)dx. 

0  0 

Expanding  e**"  in  a  power  series  we  have 
le'^^sixjdx^ 


n]/27i  ] 


l  +  a;coH + + 

2!        3! 


dx. 


The  general  term  in  this  expansion  is  of  the 
form 


«  1  [log  z— my 

(—  oa)»  CO*-  r  2  L     n     J 

,— -   —\afe 
ny2n   rlj 


dx 
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which  according  to  the  fonnulas  given  on  page 
74  reduces  to : 

(_co)'e"'<'+"+'''"'<'+"'co':r! 
Hence  we  may  write 

r  =  <x> 

Consequently  the  relation  between  the  semi- 
invariants  and  the  frequency  function 

F(x)  =  A;„<l>o(^)  -^  4>x(a;)+|J<I'2(x)  -  ^<P,{x)+ ... 

can  be  expressed  by  the  following  recursion  for- 
mula 

5o6  -^0+11  +  21  "^  3!  "^ 

«=«o  «=oo  r=  CO 

(r+i)+i/^2(r+i)«^r.^j 

t>  =  0  »  =  0  r  =  0 


=X'.S  =^^;-^-- 


The  constants  k  are  here  expressed  in  terms  of 
the  unadjusted  moments  or  power  sums,  5.  It  is 
readily  seen  that  the  Sheppard  corrections  for 
adjusted  moments,  M,  also  apply  in  this  case. 
We  are,  therefore,  able  to  write  down  the  values 
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of  the  fe's  from  the  above  recursion  formula  in  the 
following  manner 

M^  =  V"""'^" +^0^'"'"'"' 

It  is  easy  to  see  that  it  is  not  possible  to 
determine  the  generating  function's  parameters  m 
and  n  from  the  observations.  These  parameters 
like  M  and  (5  in  the  case  of  the  Laplacean  normal 
probability  curve  must  be  chosen  arbitrarily.  If 
VI  and  n  are  selected  so  as  to  make  /Cj  and  k^ 
vanish  we  have 


M„  =  feoe'»+'*"' 

M,  =  k,e""+'"' 

M,  =  koe""*'-'"' 

30ll 

jtion  of  which  gives 

e"' 

Ml  Mi 

M\ 

while 
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This  theory  requires  the  computation  of  a  set 
of  tables  of  the  generating  function 

1  rlogx— ml* 

,  ,  ,         1      ^yL-^— J 
ny  271 

and  its  derivatives.  For  ^q{x)  itself  we  may  of 
course  use  the  ordinary  tables  for  the  normal 
curve  cpoC^;)  when  we  consider 

log  X  —  m 
n 

I  have  calculated  a  set  of  tables  of  the  deriv- 
atives of  ^q{x)  and  hope  to  be  able  to  publish  the 
manuscript  thereof  in  the  second  volume  of  my 
treatise  on  "The  Mathematical  Theory  of  Prohah- 
ilities". 

23.  PARAMETERS  The  abovc  development  is 
"i^Air^SQUAREl  based  upon  the  theory  of  func- 
tions  and  the  theory  of  definite 
integrals.  We  shall  now  see  how  the  same  pro- 
blem may  be  attacked  by  the  method  of  least 
squares  after  we  have  determined  by  the  usual 
method  of  moments  the  values  of  m  and  n  in  the 
generating  function  ^oiz) . 
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Viewed  from  this  point  of  vantage  our  problem 
may  be  stated  as  follows  : 

Given  an  arbitrary  frequency  distribution,  of 
the  variate  z  with  z  ^=  (logo:  —  m)  :  n  and  where 
X  is  reckoned  from  a  zero  point  or  origin,  which 
is  situated  a  units  below  the  mean  and  defined  by 
the  relation 

r\^A3  —  3ii2\2  =  x^^  where  a  =  \  —  r\; 

to  develop  F(z)    into  a  frequency   series  of  the 
form 

where  the  /c's  must  be  determined  in  such  a  way 
that  the  expression 


2" 


kiipiiz) 


gives  the  best  approximation  to  F(z)  in  the  sense 
of  the  method  of  least  squares. 

Stated  in  this  form  the  frequency  function  is 
reduced  to  the  ordinary  series  of  Gram  or  the  A 
type  of  the  Charlier  series,  already  treated  in  the 
earlier  chapters. 


6* 
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24.  APPLICATION       As  an  illustration  of  the  theory 

OFAAWRTALITY     to  a  practical  problem  wb  pre- 

TABLE  ggj^^    ^Y^^    following    frequency 

distribution  by  5-year  age  intervals  of  the  number 
of  deaths  (or  ^dx  by  quinquennial  grouping)  in 
the  recently  published  American-Canadian  Mor- 
tality of  Healthy  Males,  based  on  a  radix  of 
100,000  entrants  at  age  15. 

Frequency    Distribution    of    Deaths    by    Attained 
Ages  in  American-Canadian  Mortality  Table. 

Ages  2dx  Ist  Component     2d  Comp. 

15—  19  1,801  120           1,681 

20—  24  1,996  230           1,766 

25—  29  2,089  440           1,649 

30—  34  2,120  790           1,330 

35—  39  2,341  1,370              971 

40—  44  2,911  2,270              641 

45-  49  3,937  3,570       -       367 

50—  54  5,527  5,400              127 

55—  59  7,723  7,722                 1 

50—  64  10,383  10,383 

65—  69  12,987  12,987 

70—  74       .    14,535  14,535 

75—  79  13,807  13,807 

80—  84  10,328  10,328 

85—  89  5,464  5,464 

90—  94  1,757  1,757 

95—  99  278  278 

100—104  16  16 


100,000  91,467  8,533 
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The  curve  represented  by  the  dx  cohimn  is 
evidently  a  composite  frequency  function  com- 
pounded of  several  series.  From  a  purely  mathe- 
matical point  of  view  the  compound  curve  may 
be  considered  as  being  generated  in  an  infinite 
number  of  ways  as  the  summation  of  separate 
component  frequency  curves.  From  the  point  of 
vievs^  of  a  practical  graduation  it  is,  how^ever,  easy 
to  break  this  compound  death  curve  up  into  two 
separate  components.  A  mere  glance  at  the  dx 
curve  itself  suggests  a  major  skew  frequency  curve 
with  a  maximum  point  somewhere  in  the  age 
interval  from  70 — 75  and  minor  curve  (practically 
one-sided)  for  the  younger  ages. 

Let  us  therefore  break  the  'Ldx  column  up  into 
the  two  so  far  perfectly  arbitrary  parts  as  shown 
in  the  above  table  and  then  try  to  fit  those  two 
distributions  to  logarithmically  transformed  A 
curves. 

Starting  with  the  first  component  the  straight- 
forward computation  of  the  semi-invariants  is 
given  in  the  table  below  with  th^  provisional  mean 
chosen  at  age  67. 
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Frequency   Distribution  of   Deaths   in   American 
Mortality  Table  First  Component. 

Ages                      X               Fix)              xFix)              3*F(x)  7*Fix) 

104—100        —  7            16            112            784  5,488 

99—  95        —  6          278         1,668        10,008  60,048 

94_  90        —5        1,757         8,785        43,925  219,625 

89—  85        —  4        5,464        21,856        87,424  349,696 

84-  80        —  3      10,328        30,984        92,952  278,856 

79—  75        —  2      13,807        27,614        55,228  110,456 

74—  70        —  1      14,535        14,535        14,535  14,535 

69—  65         —  0      12,987                0               0  0 


59,172 

106,554 

304,856 

1,038,704 

64—  60 

+  1 

10,383 

10,383 

10,383 

10,383 

59—  55 

+  2 

7,723 

15,446 

30,892 

61,784 

54^  50 

+  3 

5,400 

16,200 

48,600 

145,800 

49—  45 

+  4 

3,570 

14,280 

57,120 

228,480 

44—  40 

+  5 

2,270 

11,350 

56,750 

283,750 

39—  35 

+  6 

1,370 

8,220 

49,320 

295,920 

34—  30 

+  7 

790 

5,530 

38,710 

270,970 

29—  25 

+  8 

440 

3,520 

28,160 

225,280 

24—  20 

+  9 

230 

2,070 

18,630 

167,670 

19—  15 

+  10 

120 

1,200 

12,000 

120,000 

32,296 

88,199 

350,565 

1,810,037 

Sr       91,468    —17,355      655,421       771,333 

Computing   the  semi-invariants   by   means   of 
the  usual  formulas  in  paragraph  13,  v^^e  have  : 

Xj  =—17355:91468  =  —0.18974,   or   mean   at 
age  67  +  5  (0.19)  or  at  age  67.95 
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\2  =  655421 :  91468  —  A^"  -  7.1296 

As  -  771333:91468  — 3 Xim2  +  2Xi'  =  12.4981. 

In  order  to  determine  the  mathematical  zero 
or  the  origin  we  have  to  solve  the  following  cubic : 

XgH^  — SXg^n'  =  V,  or 

12.498 n'— 152.511  n'  ^  362.47 

the  positive  root  of  which  is  equal  to  12.39.  The 
zero  point  is  therefore  found  to  be  situated  12.39 
5-year  units  from  the  mean  or  at  age  67.95  +  5 
(12.39),  i.  e.  very  nearly  at  age  130,  which  we 
henceforth  shall  select  as  the  origin  of  the  co- 
ordinate system  of  the  first  component.  We  have 
furthermore 

12.39  =  e»»+i-5«'^  and    7.1296  =  e2m-f3«'(g«»_i)  = 
-  (12.39)  =^(e»*—l), 

the  solution  of  which  gives  n^  =  0.04436,  n  = 
0.2106,  m  =  2.4504,  all  on  the  basis  of  a  5-year 
interval  as  unit.  If  we  wish  to  change  to  a  single 
calendar  year  unit  we  must  add  the  natural 
logarithm  of  5,  or  1.6094,  to  the  above  value  of  m, 
which  gives  us  m  =  4.0598,  while  n  remains  the 
same.  The  above  computations  furnish  us  with 
the  necessary  material  for  the  logarithmic  trans- 
formation of  the  variate  x  which  now  may  be 
written  as 
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z  =  [log  (130  — .T)  —4.0598]  :  0.2106, 

where  x  is  the  original  variate  or  the  age  at  death. 
Having    thus    accomplished    the    logarithnaic 
transformation    we    may    henceforth    write    the 
generating  function  as 

_  1  pog  (130 -g)- 4.0598-1* 

=  ,.»  =  -!;.-' 

We  express  now  F  {x)  by  the  following 
equation. 

F{x)  =  A:oOo(:r)  +  A:3^3(^)  +  k^^^^i^)  +  .  •  .  . 

or  in  terms  of  the  transformed  z : 

(p(z)  =  ko^o(z)  +  A;3cp3(2)  +  k^(p^(z)  + , 

and  proceed  to  determine  the   numerical  values 
of  k  by  the  method  of  least  squares. 

The  numerical  calculation  required  by  this 
method  follows  precisely  along  the  same  lines  as 
described  in  paragraph  17.  I  shall  for  this  reason 
not  reproduce  these  calculations  but  limit  myself 
to  quote  the  final  results  for  the  various  co- 
efficients k,  which  are  as  follows: — ^ 


^  Interested  readers  may  consult  the  detailed  com- 
putations on  pages  246—257  in  my  Mathematical 
Theory  of  Probabilities  (2nd  Edition,  New  York, 
1921. 


Mortality  Tables.  89 

k^  =  7361.8;    k^  =  —212.2;    k,  =  —9.6. 

The  final  equation  of  the  frequency  curve  of 
the  first  component  F  (x) ,  is  therefore  : — 

Fi(x)  =  7361.8(po(2)  — 212.2(p3(2)  — 9.6(p4(z), 

where  the  generating  function,    cp„(;2),  is  of  the 
form : — 

1  riog  (130  —  j)  —  4.05981* 

cpo(^)  =  ^^o(^)  = —j=  e-^^  '■''''  -■ 

0.2106  |/2:rr 

The  second  component,  F^j  (.r),  can  by  means 
of  a  similar  process  be  expressed  by  the  equa- 
tion : — 

Fn{x)  =  947.4(po(2:)— 63.4cp3(2)— 30.0cp4(2), 

where 

1  riog  (a;  +  68.8)  -  4.5321* 
1  ^      2  [  0.12  J 


To  (2)  =  <l\{x) 


0.121/27C 


Addition  of  these  two  component  curves  gives 
us  the  ultimate  compound  frequently  curve, 
representing  the  dx  oi  the  mortality  table. 

A  comparison  between  the  observed  values  of 
dx  and  the  values  of  dx  as  computed  from  the 
above  equation  is  shown  in  graphical  form  in  the 
attached  diagram.  Evidently  the  graduation  leaves 
but  little  to  be  desired  in  the  way  of  closeness 
of  fit. 


90 
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Figure  1. 

Diagram  showing  graduation  of  d^.  column  in  the  AM  (5)  table  by  a 
compound  frequency  curve  of  the  Gram-Charlier  types. 


25.  BIOLOGICAL  It    appears    that    the    Itahan 

^^F^iSwm-ALiTY^  statisticians  were  the  first  to 
break  up  the  dx  curve  into  a 
system  of  five  or  more  component  frequency 
curves,  which,  however,  were  all  of  the  normal 
Laplacean  type.  Pearson  who  in  a  brillant  essay 
entitled  Chances  of  Death  was  the  next  to  attack 
the  problem,  employed  a  system  of  five  skew 
frequency  curves.  Already  as  early  as  1914  I  found 
that  from  ages  above  10  the  majority  of  dx 
curves  in  previously  constructed  mortality  tables 
could  be  represented  by  not  more  than  two  skew 
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frequency  curves  as  shown  in  the  above  example 
of  the  AM  <5)  table. 

Although  all  such  investigations  may  be  very 
interesting  and  useful  from  the  point  of  view  of 
the  actuary,  we  must,  however,  not  overlook  the 
fact  that  the  breaking  up  of  the  compound  dx 
curve  in  the  manner  just  described  is  merely  an 
empirical  process  pure  and  simple.  While  such 
processes  undoubtedly  represent  very  neat  methods 
of  graduation,  a  quite  different  and  more  im- 
portant question  is  whether  mathematical  work 
of  this  kind  allows  of  a  biological  interpretation. 
It  is  evident  that  from  a  mere  mathematical  point 
of  view  we  may  break  up  the  dj-  curve  into  various 
component  parts  in  an  infinite  number  of  ways. 
But  while  such  breaking  up  processes  may  be 
extremely  interesting  as  actuarial  graduations  and 
exercises  in  pure  mathematics,  they  have  evidently 
little  connection  with  the  underlying  biological 
facts  of  a  mortality  table.  This  aspect  of  the 
question  has  been  brought  out  in  a  very  forcible 
manner  by  the  eminent  American  biometrician, 
Raymond  Pearl,  in  his  1920  Lowell  Institute 
Lectures.  The  whole  subject  would  appear  in  a 
quite  different  light  if  it  were  possible  to  give  a 
biological  interpretation  of  the  mathematical 
analysis  and  to  show  that  the  component  fre- 
quency curves  as  derived  from  pure  mathematics 


92  Frequency  Curves. 

have  a  counterpart  in  actual  life.  This,  I  think, 
would  be  very  difficult,  if  not  impossible  to 
establish,  because  it  is  not  mathematics  which 
determines  the  conduct  or  behavior  of  living 
organisms.  One  might,  however,  view  the  whole 
problem  from  the  standpoint  of  the  biologist 
rather  than  from  the  standpoint  of  the  mathema- 
tican.  The  problem  then  is  to  ascertain  whether 
the  observed  biological  facts  as  shown  in  the 
collected  statistical  data  allow^  of  a  mathematical 
interpretation,  rather  than  to  find  a  biological 
interpretation  and  counterpart  of  previously 
established  empirical  formulae. 

It  is  to  this  important  question  that  I  have 
devoted  the  entire  discussion  of  the  second  chapter 
of  this  book.  I  have  proceeded  from  certain 
observed  biological  facts  (in  this  particular 
instance  the  statistics  on  the  number  of  deaths 
by  sex  and  attained  ages  from  more  than  150 
causes  of  death)  which  represent  the  natural 
phenomena  under  investigation.  In  order  to  offer 
a  rational  explanation  of  these  facts  and  to  inter- 
prete  their  quantitative  relationships,  I  have 
adopted  as  a  working  hypothesis  the  supposition 
that  the  number  of  deaths  according  to  attained  age 
and  sex  among  the  survivors  of  a  homogeneous 
cohort  of  say  1,000,000  entrants  at  age  10  tend 
to  cluster  around  specific  ages  in  such  a  manner 
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that  their  frequency  distribution  by  attained  ages 
can  be  represented  by  a  limited  number  of  sets 
of  Gram-Charlier  or  Poisson-Charlier  frequency 
curves. 

On  the  basis  of  this  hypothesis  we  can  now 
by  simple  mathematical  deductions  construct  a 
mortality  table  from  deaths  by  sex,  age  and  cause 
of  death  and  without  any  information  about  the 
lives  exposed  to  risk  at  various  ages. 

Finally  we  can  verify  the  ultimate  results 
contained  in  this  final  mortality  table  by  working 
back  from  the  table  to  the  data  originally 
observed. 

This  procedure  is  in  strict  conformity  with 
the  model  of  modern  science,  which  according 
to  Jevons  consists  of  the  four  processes  of  obser- 
vation, hypothesis ,  deduction  and  verification. 

The  important  factor  in  this  investigation, 
and  one  which  most  actuaries  and  statisticians 
fail  to  grasp,  is  that  I  have  looked  at  the  whole 
problem  as  a  biometrician  rather  than  as  a 
mathematician.  Mathematics  has  been  employed 
only  as  a  working  tool  in  the  whole  process,  and 
the  reason  that  the  method  has  met  with  success 
must  be  sought  for  in  concrete  biological  facts 
and  not  in  the  realm  of  mathematics. 
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26.  poissoN's  In  certain   statistical   series  it 

^FwfcTWN^         frequently    happens    that    the 
semi-invariants  of  higher  order 
than  zero  all  are  equal,  or  that 

Ai  :=  A2  ^  A3  =  .  .  .  .  =  A^  =  A. 

We  shall  for  the  present  limit  our  discussion 
to  homograde  statistical  series  where  the  variates 
always  are  positive  and  integral,  and  where  there- 
fore the  definition  of  the  semi-invariants  is  of  the 
form : — 

Xoo      X(d2     Xcd» 

or 

e  ' 
for  ic  =  0,  1,  2,  3,  .  . ., 

which  also  can  be  written  as 

=  cp(0)l  +  cp(l)e~  +  cp(2)e2-H-.... 

The  coefficient  of  e*"^  gives  the  relative  fre- 
quency or  the  probabitity  for  the  occurence  of 
X  =  T,  and  we  find  therefore  that 
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ip{x)  =  i!)(r)  = 


This  is  the  famous  Poisson  Exponential,  so 
called  after  the  French  mathematician,  Poisson, 
who  first  derived  this  expression  in  his  Recherches 
sur  la  Prohahilites  des  jugesments,  but  in  an 
entirely  different  manner  than  the  one  we  have 
indicated  above. 

The  Poisson  Exponential  opens  a  new  way 
for  the  treatment  of  statistical  series  which  poss- 
ess the  attribute  that  all  their  semi-invariants  of 
higher  order  than  zero  are  all  equal,  or  nearly 
equal.  It  is  readily  seen  that  whereas  the  Lap- 
lacea  probability  function  cpo(^)  contains  two 
parameters  X^  and  a  the  probability  function  of 
Poisson  contains  only  one  parameter,  A. 

27,  POISSON—  We  have   already  seen  in  the 

Fi^C^jENCY         previous     chapters     that     the 
Gram-Charlier  frequency  curve 
could  be  written  as 

F{x)  =  Zci(pt(a:)  =  i:.CiHi{x)^^{x) 
for  i  =  0, 1,2,3,  .... 

where    ^q{x)  is  the  generating  Laplacean  proba- 
bility function. 

The  idea  now  immediately  suggests  itself  to 
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use  a  similar  method  of  expansion  in  the  case  of 
the  Poisson  probability  function  and  to  employ 
this  exponential  as  a  generating  fuction  in  the 
same  manner  as  the  Laplacean  function.  We  are, 
however,  in  the  present  case  of  the  Poisson 
exponential  dealing  with  a  generating  function 
which  so  far  has  been  defined  for  positive  integral 
values  only  and,  therefore,  represents  a  discrete 
function.  For  this  reason  it  will  be  impossible  to 
express  the  series  as  the  sum-products  of  the  suc- 
cessive derivatives  of  the  generating  function  and 
their  correlated  parameters  c.  We  can,  however, 
in  the  case  of  integTal  variates  express  the  series 
by  means  of  finite  differences  and  write  F{x)  as 
follow  s  : 

F{x)  =  CoM^)  +  CiAit)(:r)  +  c^A^y^(x) ....     (/) 

where    \\>{x)  =  e-"*m*:.x.^  for  a:-0,l,2,3, 

and 

A^(x)   =^^{x)-^^{x-\), 

A2it)(a:)  =  Ay\>(x)  —  Ail)(a:— 1)  =  y\>{x)—2xHx—l) 

+  it)(:r— 2). 

The  series  (/)  is  known  as  the  Poisson-Char- 
lier  frequency  series  or  Charlier's  B  type  of 
frequency  curves. 

The  semi-invariants  of  these  frequency  series 
are  given  by  the  following  relation : 
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'21        J] 


x  =  0 


Expanding  and  equating  the  co-efficients 
of  equal  powers  of  co  we  have  : 

Aq  =  1  =  Co^ij)  (x)  or  Co  =  1 

\,  =         Zx  (il) (x)  +  cA^x)  +  c^^^x)  +  ...)  {II) 

Xi2 + X2  =  Zx^  (ij)  (x) + cAM^) + cA^M^) + .  • .) 


We  now  have 

Z\})(a;)  =  1,  and 
Zicij)  (ic)  =  Zme  "*m*~^ :  (ic — ^1) !  =  mZ\|)  (ic — 1)  =  m. 

We  also  find  from  well-known  formulas  of  the 
calculus  of  finite  differences  that^ 

Zx^y\>(x)  =—1 


^  These  formulas  can  also  be  derived  from  the  de- 
finition of  the  semi-invariants  and  the  well-known  rela- 
tions between  moments  and  semi-invariants  as  given  on 
page  74  when  we  remember  that  according  to  our  de- 
finition all  semi-invariants  in  the  Poisson  exponential  are 
equal  to  m. 

7 
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Zx^^x^ix)  =  0 
Zx'^^x\>(x)  =   —  (2m  +  1) 
Zx^/l^\i;>(x)  =2 

Substituting  these  values  in  (//)  we  obtain 

Xj  =  m  —  Cj 

^1^  +  ^2  =  '^^  +  ^ —  (2m  +  1)   Cj  +  2c2 

By  letting  m  =  X^  w^e  can  make  the  coefficient 
c^  vanish,  which  results  in 

Xj  =  m 

where  the  two  semi-invariants  X^  and  X^  ^^^  cal- 
culated around  the  natural  zero  of  the  number 
scale  as  origin. 

For  the  above  discussion  we  have  limited 
ourselves  to  the  determination  of  the  three  con- 
stants m,  Co  and  c^.  It  is  easy,  however,  to  find 
the  higher  parameters  c^,  c^,  c^,  .  .  .  from  the 
relations  between  the  moments  of  the  Poisson 
function  and  the  semi-invariants  of  order  3,  4, 
5,  .  .  .  ect.  Charlier  usually  calls  the  parameter  m 
the  modulus  and  c^  the  eccentricity  of  the  B 
curve. 
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28.  NUMERICAL  -^s  an  illustration  of  the  appli- 
'  examples'  cation  of  the  Poisson-Charlier 
series  we  select  the  following 
series  of  observations  on  alpha  particles  radiated 
from  a  bar  of  Polonium  as  determined  by  Euther- 
ford  and  Geiger. 

The  appended  table  states  the  number  of 
times,  F(x)y  the  number  of  particles  given  off  in 
a  long  series  of  intervals,  each  lasting  one-eighth 
of  a  minute  had  a  given  value  x :  — 

X  Fix)  X  F(x)  X  F(x) 


0 

57 

5 

408 

10 

10 

1 

203 

6 

273 

11 

4 

2 

383 

7 

139 

12 

0 

3 

525 

8 

45 

13 

1 

4 

532 

9 

27 

14 

1 

We  are  here  dealing  with  integral  variates 
which  can  assume  positive  values  only  and  the 
observations  are  therefore  eminently  adaptable  to 
the  treatment  by  Poisson-Charlier  curves.  Select- 
ing the  natural  zero  as  the  origin  of  the  co- 
ordinate system  we  find  that  the  first  two  semi- 
invariants  are  of  the  form 

Xj  =  3.8754,  X.  -  3.6257,  and  we  therefore  have  : 
m  =  Xj  -  3.88;  c^  =  V;^[K  —  m]  =  —0.125. 

The  equation  for  the  frequency  distribution  of 
the  total  N  =  2608  elements  therefore  becomes 

7* 
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Fix)  =N[y^,.,,{x)  +  (— 0.125)2Ait)3.33(a;)]. 

The  table  below  gives  the  values  as  fitted  to 
the  curve,  F{x)  : 

Alpha  Particles  Discharged  from  Film  of  Polonium 
(Rutherford  and  Geiger). 


N  =  : 

2608,  m  =   3.88,  c,  = 

—  0.126 

(1) 

(2) 

(3) 

(4) 

(5) 

(6) 

X 

t>(a:) 

A"l!7(^) 

NX  (2) 

i^X(3)Xc2 

(4)  +  (5) 

0 

.020668 

+  .020668 

53.9 

—  6.7 

47 

1 

.080156 

+  .038820 

209.0 

—12.7 

196 

2 

.155455 

+  .015811 

405.4 

—  5.2 

400 

3 

.201015 

—.029793 

524.2 

+  9.7 

533 

4 

.194967 

—.051608 

508.5 

+  16.8 

525 

5 

.151625 

—.037654 

394.5 

+  12.3 

407 

6 

.097850 

—.009714 

254.9 

+  3.2 

258 

7 

.054249 

-.009814 

141.2 

—  3.2 

138 

8 

.026316 

+  .015668 

68.7 

—  5.1 

64 

9 

.011351 

+  .012968 

29.6 

—  4.2 

25 

10 

.004407 

+  .008021 

11.5 

—  2.6 

9 

11 

.001555 

+  .004092 

4.1 

—  1.2 

3 

12 

.000503 

+  .001800 

1.3 

—  0.6 

1 

13 

.000150 

+  .000699 

0.4 

-0.2 

0 

14 

.000042 

+  .000245 

0.1 

—  0.1 

0 

15 

.000010 

+  .000076 

0.0 

—  0.0 

0 

16 

.000003 

+  .000025 

0 

17 

.000001 

+  .000005 

0 

As  a  second  example  we  offer  our  old  friend, 
the  distribution  of  flower  petals  in  Ranunculus 
Bulhosus.  Selecting  the  zero  point  at  a;  =  5  and 
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computing  the  semi-invariants  in  the  usual 
manner  we  obtain  the  following  equation  for  the 
frequency  curve. 

Fix)  =  222 1])  (a;)  +  S1.5A^y];>{x) ,  m  =  0.631. 

A  comparison  between  calculated  and  observed 
values  follows : — 


5 
6 
7 
8 
9 
10 


134.9 

51.6 

22.5 

9.5 

2.9 

0.6 


Obs. 

133 

55 

23 

7 

2 

2 


29.     TRANS- 
FORMATION OF 
THE  VARIATE 


For  integral  variates  we  have 
shown  that  the  Poisson  fre- 
quency curve  possesses  the  im- 
portant property  that  all  its  semi-invariants  are 
equal.  Now  while  a  frequency  distribution  of  a 
certain  integral  variate,  x,  may  perhaps  not 
possess  this  property,  it  may,  however,  very  well 
happen  after  a  suitable  linear  transformation  has 
been  made,  that  the  variate  thus  transformed  will 
be  subject  to  the  laws  of  Poisson 's  function. 

Let  z  =  ax  —  h  represent  the  linear  trans- 
formation which  is  subject  to  the  above  laws  with 
a  series  of  semi-invariants  all  equal  to  m. 


102  Frequency  Curves. 

These   semi-invariants   according   to  the   pro- 
perties set  forth  in  paragraph  5  are  therefore 

M  =  \^{z)  =  aXi{x)  —  b 
m  =  X2(z)  =  a'X2{x) 


and  our  problem  is  to  find  the   unknown  para- 
meters a,  h  and  m. 

Simple  algebraic  methods,  which  it  will  not 
be  necessary  to  dwell  upon,  give  the  following 
results : 

a   ==  X2:X3 

m  =  Xg^rXg- 

b   =  a\i  —  m 

As  a  numerical  illustration  of  this  trans- 
formation we  choose  from  J0rgensen  a  series  of 
observations  by  Davenport  on  the  frequency 
distribution  of  glands  in  the  right  foreleg  of  2000 
female  swine. 

No.  of  Glands..   01      23456789   10 
Frequency  .....  15  209  365  482  414  277  134  72  22    8    2 

The  values  of  the  three  first  semi-invariants  are 
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\  =  3.501,  X2  =  2.825,  X3  =  2.417, 

a  =  2.825  :  2.417  =  1.168, 

m  =  2.825' :  2.417^  =  3.859, 

h  =  (1.168)  (3.501)  —3.859  =  0.230. 

The  new  variable  then  becomes  z  =  az  —  h 
and  the  transformed  Poisson  probablity  function 
takes  on  the  form : 

In  general,  however,  we  will  find  that  z  is  not 
a  whole  number  and  the  expression  z  !  therefore 
has  no  meaning  from  the  point  of  view  of 
factorials  at  least.  This  difficulty  may,  however, 
be  overcome  through  the  introduction  of  the  well- 
known  Gamma  Function,  T(z  +  1),  which  holds 
true  for  any  positive  or  negative  real  value  of  z 
and  which  in  the  case  of  integral  values  of  z 
reduces  to  Y(z  -\-  1)  =  z\ 

Hence  we  can  write  the  transformed  Poisson 
probability  function  as 

Tables  to  7  decimal  places  of  the  Gamma 
Function,  or  rather  for  the  expression  —  T  (z-\-l)  y 
have  been  computed  by  J0rgen8en  in  his  Frekvens- 
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flader  and  Korrelation  from  z  =  — 5  to  z  =  1^, 
progressing  by  intervals  of  0.01. 

By  means  of  this  table  and  the  tables  of 
ordinary  logarithms  it  is  now  easy  to  find  the 
values  of  '^{z)  in  the  case  of  the  example  relating 
to  the  number  of  glands  in  female  swine.  The 
detailed  computation  is  shown  below. ^ 


(1) 

(2) 

(3) 

(4) 

(5) 

(6) 

(7) 

X 

Z 

—  log 

r(z+i) 

logm2 

(3)  +  (4) 
+  loge— "» 

^{z) 

F{x) 

0 

—.230 

.9209 

.8651 

.1101—2 

.0129 

30.1 

1 

+  .938 

.  .0108 

.5500 

.8849—2 

.0767 

179.2 

2 

2.106 

.6555 

.2350 

.2146—1 

.1639 

382.9 

3 

3.274 

.0679 

.9199 

.3119—1 

.2051 

479.1 

4 

4.442 

.3216 

.6048 

.2501—1 

.1780 

415.8 

5 

5.610 

.4547 

.2897 

.0685—1 

.1171 

273.6 

6 

6.778 

.4904 

.9746 

.7891—2 

.0615 

143.7 

7 

7.946 

.4446 

.6595 

.4282—2 

.0268 

62.6 

8 

9.114 

.3285 

.3444 

.9970—3 

.0099 

23.1 

9 

10.282 

.1506 

.0294 

.5041-3 

.0032 

7.5 

10 

11.450 

.9177 

.7143 

.9561—4 

.0009 

2.1 

*  The  characteristics  of  the  logarithms  have  been 
omitted  in  this  table  (except  in  column  5)  and  only  the 
positive  mantissas  are  shown.  Column  7  represents  the 
2000  individual  observations  pro  rated  according  to 
column  6. 


CHAPTER  II 

(TRANSLATED  BY  MR.  VIGFUSSON) 


THE  HUMAN  DEATH  CURVE 

In  the  following  paragraphs  I 

1.  INTRODUCTORY  ^  „         ^    ^  ,       n       /. 

REMARKS  intend  to  discuss  a  method  of 

constructing  mortality  tables 
from  mortuary  records  by  sex,  age  and  cause  of 
death,  but  without  reference  to  or  knowledge  of 
the  exposed  to  risk  at  various  ages.  This  proposed 
method  is  indeed  one  which  has  been  severely 
criticized  in  certain  quarters,  and  several  critics 
flatly  deny  that  it  is  possible  to  construct  morta- 
lity tables  from  such  data  without  detailed  infor- 
mation of  the  exposed  to  risk.  It  is,  however,  a 
very  dangerous  practice  to  say  that  a  certain  thing 
is  impossible.  The  true  scientist,  least  of  all, 
should  attempt  to  set  limits  for  the  extension  of 
human  knowledge.  It  is  still  remembered  how  the 
great  August  Comte  once  denied  that  it  ever 
would  be  possible  to  determine  the  chemical  con- 
stituents of  the  celestial  bodies.  Only  a  few  years 
after  this  emphatic  denial  by  the  brilliant  French- 
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man  the  spectroscope  was  discovered,  by  means  of 
w^hich  we  have  been  able  to  detect  a  number  of 
chemical  elements  of  other  worlds  than  that  of 
our  own  little  earth.  It  is  but  fair  to  say  that  the 
method  which  w^e  here  shall  describe  has  met  with 
rather  determined  opposition  in  certain  actuarial 
quarters.  Under  such  circumstances  it  is  natural 
that  the  process  will  be  viewed  in  a  light  of  scep- 
ticism and  criticism.  I  welcome  such  an  attitude 
because  it  has  been  my  purpose  to  present  the 
following  studies  for  further  investigation  and  not 
to  force  them  upon  my  readers  as  authoritative 
or  as  a  kind  of  infallible  dogma. 

In  presenting  the  outlines  of  the  proposed 
method  I  wish  to  state  that  it  has  never  been  the 
intention  to  supplant  the  orthodox  methods  of 
constructing  mortality  tables  where  we  have  ex- 
act information  of  the  so-called  "exposed  to  risk" 
or  number  living  at  various  ages.  Numerous  and 
very  important  examples,  however,  offer  them- 
selves in  actuarial  and  statistical  practice  where 
such  information  is  not  available.  Most  of  the 
greater  American  Life  Insurance  Companies, 
especially  those  writing  the  so-called  industrial 
insurance,  have  on  hand  an  enormous  amount  of 
information  of  deaths  by  sex,  attained  age  and  by 
cause  of  death  among  their  policyholders.  Even 
the  mortuary  records  of  certain  occupations,  as 
for  instance  metal   and  coal  miners,   among  the 
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death  claims  in  the  industial  class  are  so  numer- 
ous, that  it  would  be  possible  to  construct  a  mor- 
tality table  for  such  professions  if  we  know  the 
exact  number  exposed  to  risk  at  various  ages. 
Such  information  is,  however,  in  the  majority  of 
cases  wanting,  or  could  only  be  obtained  by  means 
of  a  great  expenditure  of  time  and  labor.  Again, 
as  Mr.  F.  S.  Crum  has  pointed  out  in  an  article 
in  the  "Insurance  and  Commercial  Magazine",  a 
number  of  cities  and  states  in  United  States  give 
from  year  to  year  very  detailed  information  in 
regard  to  mortuary  records  by  sex,  age  at  death 
and  cause  of  death.  On  account  of  the  intense 
migration  taking  place  in  certain  sections  of  the 
United  States,  especially  in  those  of  an  industrial 
character,  it  is,  however,  impossible  to  know  the 
exact  population  at  various  ages,  except  in  the 
particular  years  in  which  the  federal  or  state 
census  has  been  taken.  The  fact  that  for  all  but 
a  few  states  of  this  country  the  intercensal  period 
is  no  less  than  ten  years,  the  determination  of  the 
population  composition  by  age  and  sex  for  a  given 
locality  and  intercensal  year,  with  any  degree  of 
accuracy,  becomes  a  practical  impossibility  without 
a  special  count.  Such  a  count  or  census  of  a 
specific  locality  or  a  single  city  is,  however,  a 
costly  undertaking  at  its  best,  for  which  the  nec- 
essary funds  are  rarely  available.  In  all  such 
instances    the    mortuary    records    are    practically 
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worthless  in  so  far  as  the  construction  and  com- 
putation of  death  rates  are  concerned,  if  we  are 
to  rely  solely  upon  the  usual  method  of  construct- 
ing mortality  tables.  It  will  therefore  readily  be 
seen  that,  apart  from  purely  academic  interests, 
the  possibility  of  establishing  a  method  of  con- 
structing mortality  tables  without  knowing  the 
population  exposed  to  risk  at  various  ages  would 
be  of  great  practical  value,  and  I  deem  no  apology 
necessary  to  present  the  following  method,  which 
intends  to  overcome  this  very  obstacle  of  having 
no  information  of  the  exposures. 

2.  EMPIRICAL  AND     In  Order  to  bring  the  method 

INDUCTIVE  ME-      •     >        ,i  +•  M. 

THODS  OF  SOLU-    mto  the  proper  perspective  it 
TioN.  ^-ji  1^^  ^£  value  to  contrast  it 

with  the  ordinary  methods  followed  in  the  con- 
struction of  mortality  tables.  Let  us  therefore 
briefly  review  those  methods  and  principles  com- 
monly employed  by  actuaries  and  statisticians.  A 
certain  number,  say  L^  persons  at  age  x,  are  kept 
under  observation  for  a  full  calendar  year  and  the 
number,  D^,  who  die  among  the  original  entrants 
during  the  same  year  are  recorded.  The  ratio 
D  ^  :  L^  is  then  considered  as  the  crude  probabi- 
lity of  dying  at  age  x.  Similar  crude  rates  are  ob- 
tained for  all  other  ages  and  are  then  subjected  to 
a  more  or  less  empirical  process  of  graduation  to 
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smooth  out  the  irregularities  arising  from  what  is 
considered  as  random  sampling.  One  then  chooses 
an  arbitrary  radix,  say  for  instance  100,000  per- 
sons at  age  10,  which  represents  a  hypothetical 
cohort  of  10-year  old  children  entering  under  our 
observation.  This  radix  is  then  multiplied  by  the 
previously  constructed  value  of  q  ^^  and  the  product 
represents  the  number  dying  at  age  10.  This 
number,  d^^,  is  subtracted  from  l^^  ^^  100,000  and 
the  difference  is  the  number  living  at  age  11  or 
Zjj.  This  latter  number  is  then  multiplied  hy  q^^ 
and  the  result  is  ^u,  or  the  number  dying  at  age 
11  out  of  the  original  cohort  of  100,000.  In  this 
way  one  continues  for  all  ages  up  to  105,  or  so. 

It  is  to  be  noted  that  the  column  of  q^  in  this 
process  represents  the  fundamental  column  while 
the  columns  of  l^  and  d  ,^  are  purely  auxiliary 
columns. 

Allow  us  here  to  ask  a  simple  question.  Do 
these  empirically  derived  numbers  of  deaths  at 
various  a^es  out  of  an  original  cohort  of  100,000 
entrants  at  age  10  give  us  any  insight  or  clue  as 
to  the  exact  nature  of  the  biological  phenomenon 
known  as  death,  and  are  we  by  this  method  enab- 
led to  lift  the  veil  and  trace  the  numerous  causes 
which  must  have  been  at  work  and  served  to  pro- 
duce the  total  effect,  the  d^,  cun^e,  of  which  we 
by  means  of  the   usual   methods  have   a  purely 


110  Human  Death  Curves. 

empirical  representation  ?  I  fear  that  this  question 
will  have  to  be  answered  in  the  negative.  The 
usual  actuarial  methods  do  not  give  us  a  single 
glance  into  the  relation  between  cause  and  effect, 
which  after  all  is  the  ultimate  object  of  investiga- 
tion for  all  reaJ  science.  Probably  some  critics 
would  answer  that  they  are  not  interested  in  in- 
vestigating causal  relations.  Such  an  attitude  of 
indifference  is,  however,  very  dangerous  for  a  sta- 
tistician or  an  actuary  whose  very  work  rests  upon 
the  validity  of  the  law  of  causality.  We  may, 
however,  overlook  this  apparent  inconsistency  of 
the  empiricists  and  turn  our  attention  to  the  pro- 
posed methods  of  constructing  mortality  tables 
along  inductive  lines,  or  by  the  process  which 
Jevons  has  termed  a  complete  induction. 

Such  a  process  we  should  find  diametrically 
opposite  to  the  methods  of  the  empiricists,  both  in 
respect  to  points  of  attack  and  deduction.  In  the 
case  of  the  empiricists  the  q^  is  the  initial  and 
fundamental  function  from  which  the  d    column 

X 

is  computed  as  a  mere  by-product.  The  rationalistic 
method  starts  with  the  d^  column  and  terminates 
with  the    q^  as  the  by-product. 

Being  primarily  interested  in  the  absolute 
number  of  deaths  and  not  in  the  relative  frequen- 
cies of  deaths  at  various  a^es,  our  first  question 
is  therefore,  "What  is  the  form  of  the  frequency 
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curve  representing  the  deaths  at  vaarious  ages 
among  the  survivors  of  the  original  group  of 
100,000  entrants  at  age  10?"  Right  here  we  can, 
strange  to  say,  apply  some  purely  a  priori  know- 
ledge. We  know  a  priori  that  the  curve  must  be 
finite  in  extent,  because  of  the  very  fact  that  there 
is  a  definite  limit  to  human  life,  and  we  also  know 
that  it  assumes  only  positive  values.  There  can  be 
no  negative  numbers  of  deaths  unless  we  were  to 
regard  the  reported  theological  miracles  of  resur- 
rections from  the  Jewish-Christian  religion  as 
such.  This  information  about  the  death  curve,  or 
the  curve  of  d^,  is,  however,  not  sufficient  for  use 
as  a  bafids  for  our  deductions.  We  must  therefore 
look  about  for  additional  information,  whether  of 
an  a  priori  or  an  a  posteriori  nature  and  of  such 
a  general  character  that  it  can  he  adopted  as  a 
hypothesis. 

It  was  Poincare  who  once  said 

3.  GENERAL  PRO-  ^  , .       . .  . 

PERTiES  OF  THE    that  cvcry  sfeneralization  is  a 

"DEATH  CURVE"      ,  _        /     °  __  ,      ,, 

hypothesis.  Hence  we  shall 
look  for  some  general  characteristics  which  all 
mortality  tables  have  in  common  in  the  age 
interval  under  consideration  (age  10  and  up- 
wards). Let  us  take  any  mortality  table,  I  do 
not  care  from  what  part  of  the  world,  and 
examine  the  general   trend   of   the   curve  traced 
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by  the  values  of  d^  for  various  ages.  The  curve 
rises  gradually  from  the  age  of  ten.  The  increase 
in  the  number  of  deaths  among  the  survivors  at 
various  ages  will  increase,  although  not  uniformly, 
until  the  ages  around  70  or  75  are  reached.  At  this 
age  interval  we  generally  encounter  a  maximum. 
From  the  ages  between  70  and  75  and  for  higher 
ages  the  number  of  deaths  among  the  survivors 
will  decrease  at  a  more  rapid  rate  than  at  the 
earlier  stages  of  life.  After  the  age  of  85  only  a 
small  number  of  the  veteran  cohort  are  still  alive. 
After  the  age  of  90  only  a  few  centenarians 
struggle  along,  keeping  up  a  hopeless  fight  with 
the  grim  reaper,  Death,  until  eventually  all  are 
carried  off  between  the  ages  of  110  and  115.  We 
can  much  better  illustrate  this  process  of  the 
struggle  between  the  surviving  members  at  va- 
rious ages  of  the  cohort  and  the  opposing  forces  a-s 
marshalled  by  the  ultimate  victor,  Death,  through 
a  graphical  representation.  The  chart  on  page  114 
shows  a  mortality  graph  of  the  male  population 
in  Denmark  (1906-1910)  from  ages  10  and  up- 
wards as  constructed  by  the  Eoyal  Danish  Stati- 
stical Bureau.  The  ordinates  of  the  curve  show 
the  number  of  deaths  at  various  ages  among  the 
survivors  of  the  original  cohort  of  100,000 entrant* 
at  agelO.  We  notice  a  gradual  increase  from  the 
younger  ages  until  the  age  of  77,  where  a  max- 
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imum  or  high  crest  is  encountered.  From  that  a^e 
a  rapid  decline  takes  place  until  the  curve  ap- 
proaches the  abscissa  with  a  strongly  marked 
asymptotic  tendency  after  the  age  of  90.  At  the 
age  of  110  all  the  members  of  the  cohort  have  lost 
out  and  death  stands  as  the  undisputed  victor,  a 
victor  among  a  mass  of  graves.  The  curve  we  thus 
have  traced  may  properly  be  called  "The  Curve  of 
Death".  On  the  same  chart  I  have  also  shown 
a  graphical  representation  of  a  comparison  between 
the  Danish  death  curve  and  the  corresponding 
death  curves  of  males  for  England  and  Wales  in 
the  period  1909—1911,  Norway  1900—1910, 
France  1908 — 1913  and  United  States  period 
1909 — 1911,  all  based  upon  an  original  radix  of 
1,000,000  entrants  at  a^e  10. 

We  will  notice  quite  important  variations  in 
these  curves.  The  curves  for  the  Scandinavian 
countries  show  a  relatively  heavy  clustering  around 
the  maximum  point  which  in  the  case  of  Den- 
mark is  reached  at  age  75,  in  England  at  age  73, 
and  in  France  at  age  72.  The  Danish  curve  is  also 
more  symmetrical  and  shows  a  more  uniform  clu- 
stering tendency  around  the  maximum  value  than 
the  other  curves.  The  asymmetry  or  skewness  is 
most  pronounced  in  the  American  curve,  due  to 
the   comparatively  greater  number   of  deaths  at 
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younger  ages  than  in   the  other  tables.    Tn  the 
curve    for    Norv^egian    males    I    naight    mention 
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another  peculiarity  which  is  absent  in  most  other 
death  curves.  I  have  reference  here  to  a  secondary 
minor  maximum  or  miniature  crest  at  the  age  of 
21.  This  maximum  point,  which  is  not  very  pro- 
nounced arises  from  the  heavy  mortality  among 
youths  in  Norway,  whose  male  population  always 
ha-s  consisted  of  rovers  of  the  sea.  A  much  larger 
proportion  of  young  men  braves  the  terrors  of  the 
sea  in  Norway  than  in  any  country  in  the  world. 
These  sturdy  decendents  of  the  Vikings  can  be 
found  in  all  parts  of  the  globe.  You  are  sure  to 
find  a  weatherbeaten  Norwegian  tramp  steamer 
even  in  the  most  deserted  and  far  away  harbours 
of  our  continents.  But  the  sea  takes  its  toll.  The 
result  is  shown  in  the  little  peak  in  the  curve  of 
death  among  these  sturdy  Norw^egian  youths.^ 

Despite  all  these  smaller  irregularities  all  the 
curves  have,  however,  certain  well  defined  charac- 
teristics, namely : 

1)  An  initial  increase  with  age. 

2)  A  well  defined  maximum  point  around  the 
age  period  70 — 80. 

3X)   A  more  rapid  decline  from  that  point  until 
the  ultimate  end  of  the  mortality  table. 


*  Another  factor  is  the  high  number  of  deaths  from 
tuberculosis  typical  of  youth.  See  in  this  connexion  dis- 
cussion in  paragraph  12  a  under  the  Japanese  Table. 

8* 
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The  most  interesting  of  these 

4.  RELATION  OF  ,  ^   .      .  . 

FREQUENCY         c  o  m  m  o  n    characteristics     is 

CURVES  .  .  ^  . 

the  encountering  of  a  maxi- 
mum point  in  the  neighborhood  of  70,  and  the 
subsequent  decline  toward  the  higher  ages.  This 
fact  has  a  very  important  biometric  significance, 
which  we  shall  discuss  in  a  somewhat  detailed 
manner.  Most  of  my  readers  are  familiar  with  the 
so-called  probability  curve,  expressed  by  the 
equation : 

This  Laplacean  or  normal  curve  is  represented  in 
graphical  form  by  the  beautiful  bellshaped  curve 
so  well  known  to  mathematical  readers.  Various 
approximations  to  this  curve  are  continually  en- 
countered in  numerous  instances  of  observations 
relating  to  certain  biological  phenomena  where 
certain  measurable  attributes  of  various  sample 
populations  tend  to  cluster  around  a  certain  norm, 
such  as  the  measurements  of  heights  of  recruits, 
fin  rays  in  fish,  etc.  We  also  know  that  where  this 
tendency  to  cluster  around  the  mean  is  asymmetri- 
cal or  skew,  it  is  in  many  cases  possible  to  give 
a  very  close  representation  by  the  Laplacean- 
Charlier  frequency  curves. 

Now  let  us  return  to  our  curves  of  death.  It 
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will  be  noted  that  all  these  curves  for  ages  above 
the  crest  period  70  to  75  to  a  very  marked  degree 
approach  the  form  of  the  normal  probability  curve 
and  exhibit  a  marked  clustering  tendency  around 
this  particular  period.  The  ages  around  70,  the 
Bible's  "three  score  and  ten",  can  therefore  be 
looked  upon  as  a  norm  of  life  around  which  the 
deaths  of  the  original  cohort  group  themselves 
in  more  or  less  correspondence  with  the  binomial 
probability  law.  This  pronounced  grouping  ten- 
dency is  a  very  significant  biological  phenomenon, 
which  it  might  be  of  interest  to  dwell  upon. 

If  all  the  members  of  our  original  cohort  were 
identical  as  to  physical  constitution  and  characte- 
ristics, if  they  all  were  exposed  to  identically  the 
same  outward  influences  acting  upon  their  mode 
of  life,  it  becomes  evident  fromi  the  law  of  causa-  y 
lity,  which  is  the  basis  and  justification  of  every 
collection  of  statistical  data,  that  all  members 
would  die  at  the  same  moment.  We  see,  however, 
immediately  that  such  hypothetical  conditions  are 
not  present  in  human  society.  The  paramount 
feature  of  our  material  world  is  variation.  No  two  t/ 
persons  are  alike  in  regard  to  physical  constitu- 
tion. Certain  inherited  characteristics,  which  are 
present  in  the  individual  in  more  or  less  pronoun- 
ced form,  make  themselves  felt.  No  two  persons 
or  group  of  persons  can  be  said  to  be  exposed  to 
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the  same  outward  influences.  The  clergyman  and 
college  professor  living  a  sort  of  tranquil  and 
sheltered  life  are  not  exposed  to  the  same  dangers 
as  the  working  man  or  the  man  in  business  life. 
All  these  and  other  factors,  almost  infinite  in 
number,  tend  to  produce  a  decided  variation  in 
the  actual  duration  of  life.  Of  these  influencing 
fa<jtors  those  relating  to  purely  inherited  or  na- 
tural characteristics  are  without  doubt  the  most 
powerful.  If  it  were  possible  to  eliminate  certain 
forms  of  deaths  due  to  infectious  diseases,  tuber- 
culosis and  accidents,  causes  more  or  less  due  to 
outward  influences,  we  should  have  left  a  number 
of  causes  due  to  a  gradual  wearing  out  of  the 
human  system,  similar  in  many  respects  to  the 
deterioration  of  the  mechanism  in  ordinary  ma- 
chinery. The  death  curve  from  such  causes  of  death 
would  be  more  related  to  the  normal  curve  than 
the  death  curve  w^hich  includes  causes  of  death 
from  non-inherent  or  anterior  causes  as  menti- 
oned above.  This  statement  is  borne  out  in  the 
shape  of  the  Danish  death  curve.  In  Denmark 
where  a  very  determined  and  largely  successful 
fight  has  been  carried  on  against  tuberculosis,  and 
where  the  accident  rate  is  very  low  we  also  find 
that  the  curve  is  more  symmetrical  than  for  in- 
stance in  this  country  or  in  England. 

This  tendency  to  an  approach  towards  the  bi- 
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nomial  probability  curve  was  already  noted  by 
Lexis,  who  from  such  considerations  tried  to  de- 
termine what  he  called  a  "Normalalter"  or  normal 
age  for  various  countries  and  sample  populations. 
Speaking  of  this  attempt  the  eminent  Danish  sta- 
tistician, Harald  Westerga-ard,  says  in  his  „Sta- 
tistikens  Teori  i  Grundrids"  (Copenhagen  1916) 
"An  unsually  interesting  attempt  has  been  made 
by  Lexis  to  determine  the  noTmal  age  of  man. 
A  mortality  table  will,  as  a  rule,  have  two 
strongly  dominant  maximum  points  for  the  num- 
ber of  deaths.  During  the  first  year  of  life  there 
dies  a  comparatively  large  number.  From  the  age 
of  1  the  number  of  deaths  decreases  and  reaches 
its  lowest  point  in  early  youth.  It  then  again 
begins  to  increase,  at  times  in  wavelike  motions, 
until  the  maximum  point  is  reached  at  the  old 
age  period". 

"The  clustering  ajround  the  latter  point  has 
now  a  great  likeness  with  the  normal  or  Gaussian 
curve,  and  we  might  for  this  reason  call  this 
specific  age  the  normal  life  age.  For  the  cal- 
culation of  such  a  normal  age  the  argument  may 
be  put  forth  that  experience  shows  that  the  great 
variations  in  mortality  tend  to  disappear  in  old 
age.  Let  the  rate  of  mortality  in  a  certain  gene- 
ration at  age  x  be  fix  and  the  number  of  the  cor- 
responding survivors  be  Ix-   The  quantity  ^xL  will 
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then  increase  from  a  certain  point,  while  Ix  de- 
creases, in  the  beginning  slowly,  but  later  on  at  a 
more  rapid  pace.  "During  a  long  period  of  life  the 
quantity  ^xL — the  number  of  deaths  at  a  certain 
age — will  increase  with  age.  Later  on  a  reversed 
motion  takes  place.  But  when  this  reversion  will 
occur  depends  on  many  conditions,  the  successful 
fight  against  certain  diseases,  progress  in  econo- 
mic conditions,  or  change  in  the  mode  of  living. 
All  this  exercises  an  important  influence,  and  the 
maximum  point  occurs  therefore  sometimes  sooner 
and  sometimes  later.  It  is  also  important  to  in- 
vestigate the  natural  selection  in  old  age,  which 
so  to  say  divides  the  population  in  different  strata, 
each  with  its  own  state  of  health.  The  healthiest 
of  such  groups  will  with  the  increase  in  age  play 
a  greater  role.  Here  as  everywhere  it  is  the  more 
important  problem  to  study  the  clustering  around 
the  mean  inside  the  special  groups  rather  than  to 
attempt  to  find  a  derived  expression  for  the  morta- 
lity. On  the  other  hand,  the  correspondence  be- 
tween the  normal  curve  as  established  by  Lexis 
is  another  testimony  to  the  fact  that  this  curve 
or  formula  very  often  can  be  applied,  even  in 
complicated  expressions ' ' . 
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5.  THE  **DEATH         Lexis   was   satisfied   to   deter- 
^ COMPOUND  mine  the  normal  age.  A  more 

CURVE  ambitious  attempt   to   investi- 

gate the  mortality  by  means  of  frequency  curves 
throughout  the  whole  period  of  life  was  made  by 
the  eminent  English  biometrician,  Pearson,  in  a 
brilliant  essay  in  his  "Chances  of  Death".  Pear- 
son took  the  number  of  deaths  in  the  English 
Life  Table  No.  4  (males)  and  succeeded  in  break- 
ing up  the  compound  curve  into  five  component 
curves  typical  of  old  age,  middle  age,  youth,  child- 
hood and  infancy.  I  want  to  advise  my  readers  to 
study  this  brilliant  and  illuminating  essay,  especi- 
ally on  account  of  its  beautiful  form  of  exposition 
which  makes  the  whole  subject  appear  in  a  most 
interesting  light. 

Speaking  of  this  attempt  by  Pearson,  the 
American  actuary,  Henderson,  is  of  the  opinion 
that  „the  method  has  not,  however,  been  applied 
to  other  tables  and  it  is  difficult  to  lay  a  firm 
foundation  for  it ,  because  no  analysis  of  the  deaths 
into  natural  divisions  by  causes  or  otherwise  has 
yet  been  made  such  that  the  totals  in  the  various 
groups  would  conform  to  these  (the  Pearson) 
frequency  curves".  We  shall  later  on  come  back 
to  this  statement  by  Henderson,  which  we  feel 
is  a  partial  truth  only.  On  the  other  hand,  it  must 
be  admitted  that  the  system  of  Pearson's  types  of 
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skew  frequency  curves  (by  this  time  twelve  in 
number)  are  by  no  means  ea«y  to  handle  in 
practical  work  and  often  require  a  large  amount 
of  arithmetical  calculation.  Moreover,  there  seems 
to  be  no  rigorous  philosophical  foundation  for  the 
Pearsonian  types  of  curves,  and  they  can  at  their 
best  only  be  said  to  be  exceedingly  powerful  and 
neat  instruments  of  graduation  or  interpolation. 

On  tihe  other  hand,  I  am  of  the  opinion  that 
the  goal  can  be  reached  more  easily  if  w^e,  instead 
of  the  Pearsonian  curve  types,  make  use  of  the 
Laplacean-Charlier  andPoisson-Charlier  frequency 
curves,  which  are  expressed  in  infinite  series  of 
the  form  : 

F{x)  ^  q,(a;)  +  p3,piii(x)  +  p,(ptv(x)+....  (2) 
or  i^lx)  =  ij)  (x)  +  Y,  A^  t}) (a;)  +  T3  A='i|)  (X)  +  •  •  • .  (3) 

These  two  curve  types  have  been  treated 
elsewhere  by  G-ram,  Charlier,  Thiele,  Edgeworth 
J0rgensen,  Guldberg  and  other  investigators,  and 
it  is  therefore  not  necessary  to  dwell  further  upon 
their  analytical  properties,  which  w^ere  discussed 
in  Chapter  I. 

Returning  now  to  the  general  form  of  our  d ^ 
curve  of  the  mortality  table  which  we  discussed 
above,  it  is  readily  seen  that  this  curve  has  all  the 
properties  of  a  compound  frequency  curve,  that 


Compound  Curves.  128 

is,  a  curve  which  is  composed  of  several  minor  or 
subsidiary  frequency  curves,  generally  skew  in 
appearance.  As  proven  both  by  Charlier  and  by 
J0rgensen,  any  single  valued  and  positive  cx)mp- 
ound  frequency  curve  vanishing  at  both  ~\-  go  and 
—  Qo  can  be  represented  as  the  sum  of  Laplacean- 
Charlier  and  Poisson-Charlier  frequency  curves. 
We  know  thus  a  priori  that  the  d*  curve  is  comp- 
ounded of  the  two  types  of  frequency  curves.  But 
how  are  we  to  determine  the  separate  component 
curves?  It  is  readily  a-dmitted  that  no  a  priori 
reason  will  guide  us  here.  The  purely  empirical 
observer  might  therefore  abandon  the  project 
right  here,  because  to  all  appearances  it  would 
seem  hopeless  to  attempt  a  solution  by  purely 
empirical  means.  The  positive  rationalist  does 
not  despair  so  easily.  "Very  well",  he  says,  "if 
we  can  not  make  further  progress  by  purely 
empirical  means,  we  are  at  least  permitted  to  try 
deductive  reasoning  and  attempt  to  bridge  the  gap 
by  means  of  an  hypothesis".  The  hypothesis  I 
shall  adopt  is  the  following : 

The  frequency  distribution  of  deaths  ac- 
cording to  age  from  certain  groups  of  causes 
of  death  among  the  survivors  in  a  mortality 
table  tend  to  cluster  around  certain  ages  in 
such  a  manner  that  the  frequency  distribution 
can   be  represented  by   either   a  Laplacean- 
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Charlier    or    a    Poisson-Charlier    frequency 
curve. 

A  study  of  mortuary  records  by  age  and  cause 
of  death  immediately  supports  this  hypothesis. 
We  notice,  for  instance,  that  diseases  such  as 
scarlet  fever,  measles,  whooping  cough  and  diph- 
theria often  cause  death  among  children,  but 
rarely  seem  to  affect  older  people.  We  know,  for 
instance,  that  there  is  a  much  greater  probability 
that  a  5-year  old  boy  will  die  from  scarlet  fever 
than  a  man  at  the  age  of  40  wiill  die  from  the 
same  disease.  On  the  other  hand,  there  is  quite 
a  large  probability  that  an  old  man  at  age  85 
will  die  from  diseases  of  the  prostate  gland,  while 
such  an  occurrance  is  almost  unheard  of  among 
boys.  Similarly  deaths  from  cancer  and  Bright's 
disease  are  very  rare  in  youth,  but  quite  frequent 
in  early  old  age.  Tuberculosis,  on  the  other  hand, 
causes  its  greatest  ravages  in  middle  life,  and  has 
but  little  effect  upon  older  ages. 


6.  MATHEMATICAL   Leavinoj,    however,    the    ques- 

PROPERTIES    OF       .  *='  '  ^ 

THEcoMPo-       tion  of  the  grouprng  of  causes 

NENT  FREQUEN-  ^         ^      ^ 

CY  CURVES  of  death  mto  a  limited  num- 
ber of  typical  groups  to  a  later  discussion,  we  shall 
in  the  meantime  see  how  the  hypothesis  can  carry 
us  over  the  difficulties.    Let  us  for  the  moment 
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assume  that  we  are  able  to  group  the  causes  of 
death  into  say  7  or  8  groups.  We  shall  also  as- 
sume that  we  know  the  percentage  frequency 
distribution  of  deaths  according  to  age  in  each 
of  the  groups.  This  means  in  other  words  that 
we  know  the  equation  of  the  frequency  curves 
giving  the  percentage  distribution.  Let  the  ana- 
lytical expression  for  these  frequency  curves  be 
denoted  by  the  symbols: 

Fi{x\  Fuixh  Fni{x\  .  .  .,  i^viii(a:).         (4) 

Again,  let  the  total  number  of  deaths  among  the 
survivors  in  the  mortality  table  from  causes  of 
death  according  to  the  above  grouping  be  denoted 

by 

Nu  iVii,  Nuu  ^iv,  ...,  -Z^viii  respectively.    (5) 

The  number  of  deaths  in  a  certain  age  interval, 
say  between  50-54  can  then  be  expressed  as 
follows  : 

g  =  54  54  ri4 

^d^  =^N,  F,  {x)  +^  V„  Fn{x)+.. 


50  60 

54 


-\-y,  -Z^viiii^viii(a;). 


50 


(6) 


In  this  relation  the  only  known  quantities  are 
the   equations   for   the   frequency   curves    Fi(x), 
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Fii(x),  ..  .,  -Fviii(^/  of  the  percentage  frequency 
distribution  according  to  age  in  each  of  the  eight 
groups.  Neither  d^  nor  any  of  the  various  N's  are 
known.  The  only  relation  we  know  a  priori  among 
the  quantities  N  is  the  following : 

Ni  +  Nn  +  Nui+..  .iVviii  =  1,000,000.      (7) 

The  latter  equation  is  simply  a  mathematical 
expression  for  the  simple  fact  that  the  sum  total 
of  the  sub-totals  of  the  various  groups  of  causes 
of  death,  in  other  words  the  deaths  from  all 
causes  among  the  survivors  in  the  mortality  table, 
must  equal  the  radix  of  the  entrants  of  our  orig- 
inal cohort  of  1,000,000  lives  at  age  10.  Viewed 
strictly  from  the  standpoint  of  frequency  curves, 
we  might  express  the  same  fact  by  saying  that 
the  sum  of  the  areas  of  the  various  component 
curves  must  equal  1,000,000. 

It  is  readily  seen  that  on  the  assumption  that 
the  expressions  of  the  different  F(x)  conform  to 
the  above  hypothesis  it  is  possible  to  find  d^  for 
any  age  or  age  interval  if  we  can  determine  the 
values  of  the  different  N's.  It  is  in  this  possibility 
that  the  importance  of  the  proposed  method  lies, 
and  we  shall  now  show  how  it  is  possible  to  deter- 
mine the  N's  without  knowing  the  exposed  to 
risk. 
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7.  OBSERVATION        Consider  for  the  moment  the 

EQUATIONS  »    „         . 

toUowmg  expression : 


!:/i(5)-^ 


^iV^iii  Fill  (x) 


54 


^Ni  Fi  (x)  +^Fu  {x)  Nil  + 

50  5U 

54  54 

^-^Niu  Fill  {x)  +  ...  +^iV^viii  i^viii  {x) 


(8) 


What  does  this  equation  represent?  Simply  the 
proportionate  ratio  of  deaths  in  group  III  to  the 
total  number  of  deaths  in  all  type  groups  (in 
other  Words  the  deaths  from  all  causes)  in  the  age 
interval  50-54.  Such  ratios  are  usually  knov^n  as 
proportional  death  ratios.  It  is  readily  seen  that 
these  proportionate  death  ratios  are  dependent  on 
the  deaths  alone  and  absolutely  independent  of 
the  number  exposed  to  risk,  provided  tne  total 
number  of  deaths  from  all  causes  in  a  certain  age 
group  is  large  enough  to  eliminate  variations  due 
to  random  sampling.^  In  other  w^ords,  we  can  find 


I 


^  Strictly  speaking  this  statement  is  only  true  for  an 
age  interval  of  one  year  or  less  and  may  in  the  case  of 
large  perturbing  influences  in  the  population  exposed  to 
risk  be  subject  to  appreciable  errors  when  we  use  large 
age  intervals  of  10  or  more  in  our  grouping  for  the  com- 
puting of  R{x).  When  the  age  interval  for  the  grouping 
of  causes  of  deaths  by  attained  ages  is  6  years  or  less 
the  error  committed  in  assuming  R{x)  as  being  indepen- 
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a  numerical  value  for  the  term  Rm  (x)  on  the  left 
side  of  the  equation  from  our  death  records  alone 
without  reference  to  the  exposed  to  risk  in  this 
interval.  Similar  proportionate  death  ratios  can 
of  course  without  difficulty  be  determined  for  the 
other  groups  of  causes  of  death  and  for  arbitrary 
ages  or  age  intervals.  In  this  manner  we  can 
determine  a  system  of  observation  equations  with 
known  numerical  values  of  R(x)(i  =  I,  II,  III, . . .) 
The  fact  that  the  number  of  observation  equations 
in  this  system  is  much  larger  than  the  number  of 
the  unknown  ]V's  makes  it  possible  to  determine 
these  unknowns  by  the  method  of  least  squares. 

Probably  the  simplest  manner  is  first  to  deter- 
mine by  simple  approximation  methods,  or  by 
mere  inspection,  approximate  values  for  the 
various  N's  and  then  make  final  adjustments  by 
the  method  of  least  squares. 

Let,  for  instance, 


'Nu  'Nju  'J^^ 


HI 


dent  of  the  number  exposed  to  risk  is  in  most  cases 
negligible.  One  of  the  difficulties  encountered  in  the 
construction  of  a  mortality  table  for  Massachusetts  Males 
was  that  the  age  interval  used  for  the  grouping  was  10 
years  instead  of  5  years  or  less.  See  in  this  connection 
the  remarks  at  the  beginning  of  paragraph  11  and  at 
the  conclusion  of  paragraph  16  of  the  present  chapter. 
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be  the  first  approximations  of  the  areas  of  the 
various  groups  of  frequency  curves  so  that 


Let  us  furthermore  introduce  the  following 
symbols : 

'N,F,{x)  =  <t>,{x),  'NuFu(x)  =  %ix),  . .  .A 

'NyinF^,ui(,x)  =  %{x).  1^^"^ 

The  different  values  of 

<^,(x),  %{x),  <i>,(x),  ...,  %(xj 

may  then  be  regarded  as  a  system  of  component 
frequency  curves  to  which  we  now  must  apply  the 
different  correction  factors a^,  a^,  Og, . . . ,  Og  in  order 
to  fit  the  curves  to  the  observed  proportional  death 
ratios,  R{x),  for  the  various  groups  of  typical 
causes  of  death.  Let  us  for  example  assume  that 
the  observed  death  ratio  of  a  certain  age  (or  age 
group),  X,  under  a  certain  group  of  causes  of 
death,  say  group  No.  Ill,  is  Ruii^c)-  We  have 
then  the  following  observation  equation : 

Bm(x)  =  a^%ix):  k O^ (a:)  +  ttg O3 (re) +1 
+  a,<t>,{x)+...+a,%{x)  +  a,<^,{x)]      j  ^^^^ 
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Since  the  sum  of  the  areas  of  the  different  comp- 
onent curves  necessarily  must  equal  1,000,000  it 
is  easy  to  see  that  we  may  write  the  factor  ao 
in  the  last  term  of  the  denominator  in  the  follow- 
ing form  : 

a,  y^OgCx)  =  1,000,000 


or 


a,  =  (1,000,000  -  [g^  jy^^iW+gg  JS/Oo  (:r) 

=  h-[h^i  +  h^^  +  --'  +  h^8] 

where 


_  1,000,000  _ZOi(a;) 


(12) 


The  expression  for  Rm  (x)    can  then  be  put  in  the 
following  form  : 


i^iii  {x)  =  ttg  O3  (x)  :  [ttj  Oi  (a:)  +  a.  O.  (ic)  + 


(13) 
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Similar  observation  equations  for  the  other 
groups  are  derived  without  difficulty. 

Once  having  formed  the  observation  equations 
it  is  simply  a  matter  of  routine  work  to  compute 
the  normal  equations  from  which  the  values  of 
the  unknown  N's  can  be  found.  We  shall,  how- 
ever, not  go  into  detail  with  the  derivation  of  the 
necessary  formulas,  since  this  is  a  process  which 
belongs  wholly  to  the  domain  of  the  theory  of 
least  squares  and  which  has  received  adequate 
treatment  elsewhere.  (See  for  instance  Brunt's 
Combination  of  Observations.) 

8.  CLASsiFiCA-  We  think  it  more  advantage- 
'^'%^Ee^/th^^^  ous  to  illustrate  the  method  by 
a  concrete  example.  As  an 
illustration  w«  may  take  the  case  of  Michi- 
gan Males  in  the  period  1909—1915.  The 
mortuary  records  of  Males  in  Michigan  are 
for  that  period  given  in  the  reports  issued 
annually  by  the  Secretary  of  State  on  "Registrat- 
ion of  Births  and  Deaths,  Marriages  and  Divorces 
in  Michigan".  The  deaths  by  sex,  age  and  cause 
of  death  are  given  in  quinquennial  age  groups.  A 
very  serious  drawback  is  the  grouping  of  all  ages 
above  80  into  a  single  age  group  instead  of  in  at 
least  4  or  5  quinquennial  age  groups.  This  makes 
it  impossible  to  obtain  good  observation  equations 

9* 
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for  ages  above  80.  When  we  consider  that  about 
one  fifth  of  the  original  entrants  at  age  10  in  the 
mortality  table  die  after  the  age  of  80,  it  is  readily 
seen  that  this  defect  in  the  Michigan  data  is  of  a 
very  serious  character,  which  makes  it  out  of  the 
question  to  determine  correctly  the  areas  of  the 
curves  for  middle  old  age  and  extreme  old  age. 
For  ages  below  70  these  curves  do  not  play  so 
important  a  role,  and  the  method  ought  therefore 
in  these  ages  yield  satisfactory  results.  We  now 
make  the  assertion  that  the  deaths  among  the 
survivors  in  the  final  life  table  can  be  grouped  in 
the  following  typical  groups. 

Causes  of  Death  typical  of : — 
Group       I     Extreme  Old  Age. 

—  II    Middle  Old  Age. 

—  Ill    Early  Old  Age. 

—  IV    Middle  Life. 

—  V    Early  Middle  Life. 

—  VI     Pulmonary  Tuberculosis,  Etc. 

—  Vila  Early  Life  Occupational  Hazard. 

—  Vllb  Middle  Life  Occupational  Hazard. 

—  Villa  Childhood. 

The  classification  of  causes  of  death  according 
to  this  scheme  is  given  in  the  following  table,  mar- 
ked Table  A. 
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Table  A.    Michigan  Males  1909—1915 

Classification  of  causes  of  death  according  to  the 

chosen  system  of  curves. 


No.  in  Inter- 
national Class 

i.                GROUP  I 

fication. 

81. 

Diseases  of  the  arteries. 

124. 

Diseases  of  the  bladder. 

125—133. 

Other   diseases   of   the   genito-urinary 

system. 

142. 

Gangrene. 

154. 

Old  age. 

126. 

Diseases  of  the  prostate. 

GROUP  II 

10. 

Influenza. 

47—48. 

Rheumatism. 

64. 

Apoplexy. 

65. 

Softening  of  the  brain. 

66. 

Paralysis. 

79. 

Heart  disease. 

82. 

Embolism. 

89. 

Acute  bronchitis. 

90. 

Chronic  bronchitis. 

91. 

Broncho-pneumonia. 

94. 

Congestion  of  the  lungs. 

96—97. 

Asthma  and  emphysema. 

103. 

Other  diseases  of  the  stomach. 
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No.  in  Inter- 
national Classi- 
fication. 

105.    Diarrhea  and  enteritis,   (over  2  years) 
14.    Dysentery. 

GROUP  III 

39.  Cancer  of  the  mouth. 

40.  Cancer  of  the  stomach  and  liver. 

41.  Cancer  of  the  intestines. 

44.  Cancer  of  the  skin. 

45.  Cancer  af  other  organs. 

46.  Tumors. 
50.  Diabetes. 

53 — 54.    Leukemia  and  anemia. 

63.    Other  diseases  of  the  spinal  cord. 
68.    Other  forms  of  mental  diseases. 
80.    Angina  pectoris. 
109 — 110.    Hernia,     intestinal     obstruction,     and 
other  diseases  of  the  intestines. 

120.  Bright's  disease. 

121.  Other  diseases  of  the  kidneys 
123.    Calculi  of  urinary  passages. 

GROUP  IV 
56.    Alcoholism. 
18.    Erysipelas. 
62.    Locomotor  ataxia. 
73 — 76.    Other  diseases  of  the  nervous  system. 
77.    Pericarditis. 
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No.  in  Inter- 
national Classi- 
fication. 

78. 

83. 

84. 

85—86. 


87. 
88. 
92. 
93. 
95. 
98. 

99—101. 

111. 

113. 

114. 
]15— 116. 

118. 
143—145. 


147—149. 


4. 
13. 


Endocarditis. 

Diseases  of  the  veins. 

Diseases  of  the  lymphatics. 

Other  diseases  of  the  circulatory  sy- 
stem. 

Diseases  of  the  larynx. 

Diseases  of  the  thyroid  body. 

Pneumonia. 

Pleurisy. 

Gangrene  of  the  lungs. 

Other  diseases  of  the  respiratory  sy- 
stem. 

Diseases  of  the  mouth,  pharynx,  and 
oesophagus. 

Acute  yellow  atrophy  of  the  liver. 

Cirrhosis  of  the  liver. 

Biliary  calculi. 

Diseases  of  the  liver  and  spleen. 

Other  diseases  of  the  digestive  system. 

Furuncle,  abscess,  and  other  diseases 
of  the  skin. 

Diseases  of  the  joints,  and  locomotor 
system. 

GROUP  V 
Malarial  fever. 
Cholera  nostras. 
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No,  In  Inter- 
national Classi- 

fication. 

20. 

Septicemia. 

24. 

Tetanus. 

32. 

Pott's  disease. 

33. 

White  swellings. 

34. 

Tuberculosis  of  other  organs. 

35. 

Disseminated  tuberculosis. 

55. 

Other  general  diseases. 

60. 

Encephalitis. 

70—71. 

Convulsions. 

102. 

Ulcer  of  the  stomach. 

117. 

Peritonitis. 

119. 

Acute  Nephritis. 

164. 

Diseases  of  the  bones. 

155. 

Suicide  by  poison. 

156. 

Suicide  by  asphyxia. 

157. 

Suicide  by  hanging. 

158. 

Suicide  by  drowning. 

159. 

Suicide  by  firearms. 

160. 

Suicide  by  cutting  instruments. 

161. 

Suicide  by  jumping  from  hight  places 

163. 

Suicide  by  other  or  unspecified  means 

164—165. 

Accidental  poisonings. 

166. 

Conflagration. 

167. 

Burns    (conflagration  excepted). 

168. 

Inhalation  of  noxious  gases. 

172. 

Traumatism  by  fall. 
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No.  in  Inter- 
national Classi- 
fication. 

175 — (2).  Traumatism  by  electric  railway. 

175 — (3).  Traumatism  by  automobiles. 

175 — (4).  Traumatism  by  other  vehicles. 

176.  Traumatism  by  animals. 

178.  Cold  and  freezing. 

179.  Effects  of  heat. 

185.  .Fractures  and  dislocations   (cause  not 
specified. 

GROUP  VI 


28. 

Tuberculosis  of  the  lungs. 

29. 

Miliary  tuberculosis. 

37—38. 

Venereal  diseases. 

186. 

Other  accidental  traumatism. 

57—59. 

Chronic  poisoning. 

67. 

General  paralysis  of  the  insane. 

31. 

x\bdominal  tuberculosis. 

GROUP  VII 

1. 

Typhoid  fever. 

69. 

Epilepsy. 

108. 

Appendicitis. 

182. 

Homicide. 

169. 

Accidental  drowning. 

170. 

Traumatism  by  firearms. 

171. 

Traumatism  by  cutting  instruments. 
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No.  in  Inter- 
national Classi 
fication. 

173. 

Traumatism  by  mines  and  quarries 

174. 

Traumatism  by  machinery. 

175— (1). 

Traumatism  by  railroads. 

180. 

Ligthning. 

61. 

Meningitis. 

GROUP  VIII 

5. 

Smallpox. 

6. 

Measles. 

7. 

Scarlet  fever. 

8. 

Whooping  cough. 

9. 

Diphtheria  and  croup. 

30. 

Tubercular  meningitis. 

150. 

Congenital  malformations. 

9.  OUTLINE  OF  COM- "^^^  number  of  deaths  in  the 
PUTING  SCHEME  various  groups  according  to  the 
above     classification     and     ar- 
ranged according  to  age  during  the  period  1909 — 
1915  is  given  in  the  table  B  on  page  140. 

From  that  table  it  is  a  simple  matter  to  com- 
pute the  proportionate  death  ratios  of  the  separate 
groups  of  causes  of  death.  Such  a  computation  is 
shown  in  table  C  on  page  141. 

It  is  readily  seen  that  these  death  ratios  are 
independent  of  the  number  exposed  to  risk.   More- 
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over,  the  number  of  observations  seem  to  be  suffi- 
ciently large  to  eliminate  serious  variations  due 
to  random  sampling.  This  might  perhaps  not  hold 
true  for  the  age  intervals  10  to  14  and  15  to  19 
where  not  alone  random  sampling  is  present,  but 
a  somewhat  modified  classification  seems  neces- 
sary. I  have,  however,  not  used  the  observed  pro- 
portionate death  ratios  for  the  two  younger  age 
intervals  in  my  computations  which  only  took  into 
account  the  ratios  above  20.  For  this  reason  I  do 
not  deem  it  necessary  to  go  into  a  closer  investiga- 
tion of  a  re-classification  of  causes  of  death  for 
these  younger  age  groups.  A  more  serious  defect 
which  cannot  be  overcome  is  presented  in  the 
ages  above  80  where,  as  mentioned  before,  a  clas- 
sification according  to  age  is  absent  in  the  original 
records  for  the  state  of  Michigan.  The  fact  that 
the  highest  number  of  deaths  (12,473)  occurred 
in 'ages  above  80  makes  this  defect  more  serious 
than  the  omission  of  a  re-classification  of  causes 
of  death  below  20. 

So  far  we  have  only  been  concerned  with  the 
first  step  in  the  complete  induction  according  to 
the  model  of  Jevons,  namely  that  of  simple  observ- 
ation. The  next  step  in  the  induction  is  the  hypoth- 
esis. We  present  now  the  following  working 
hypothesis. 

The  frequency  distribution  of  deaths  according 
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to  age  of  the  above  groups  of  causes  of  death 
among  the  survivors  of  an  original  cohort  of 
1,000,000  entrants  at  age  10  can  he  represented  by 
a  system  of  frequency  curves  determined  by  the 
following  characteristic  parameters: 


Parameters 

Group 

Mean 

Dispersion 

Skewness 

Excess 

I 

79.5  years 

9.5730  years 

+  .1056 

+  .0546 

II 

70.5     - 

12.8000     - 

+ .0967 

+  .0126 

III 

65.5     - 

13.6870     - 

+  .1248 

+  .0650 

IV 

59.5     - 

17.0890     - 

+  .1790 

-  .0106 

V 

55.5     - 

19.9411     - 

+  .0555 

-  .0367 

VI 

44.5     - 

16.0352     - 

-  .0124 

-  .0272 

Vllb 

57.5     - 

12.1552     - 

+  .0008 

-  .0005 

Vila 

Poisson-Charlier    Curve:     Modulus     =     28.5    years, 

Eccentricity  = 

1.0001 

Villa 

Poisson-Charlier  Curve:  Modulus 

=  13.5  years. 

From  these  parameters  and  from  well-known 
tables  of  the  probability  or  normal  frequency  curve 
and  its  various  derivatives  it  is  easy  to  determine 
the  frequency  distribution  for  any  desired  interval. 

For  this  system  of  frequency  curves  we  now 
shall  try  to  find  the  various   areas  of    iVj,  iV^, 

iVjjj, ,  iVyjjj    so  as  to  conform  to 

the  observed  values  of  Rx  in  Table  C.  As  a  first 
approach  to  the  final  values  of  N,  we  may  by  an 
inspection  (which  of  course  is  improved  upon  by 
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a  long  practice  in  curve  fitting)  choose  the  follow- 
ing approximations.^ 

Group  Approximate  Value  of  'N. 


I 

123000 

II 

366000 

III 

183000 

IV 

105000 

V 

75000 

VI 

70000 

b  &  Vllb 

61000 

VIII 

17000 

1000000 

These  preliminary  numerical  values  represent 
the  first  approximations  of  the  areas  of  the  various 
frequency  curves.  The  sequence  represented  by 

'N,F,(x),  'iVnF„(x),  'N^FJ,x),  ■  •  •  'iV^F„„(x)(14) 

gives  the  number  of  deaths  at  age  x.  We  notice 
thus  that  by  multiplying  the  various  equations  of 
frequency  curves  for  arbitrary  age  intervals  with 


*  These  numbers  represent  as  a  matter  of  fact  a  first 
rough  approximation  of  the  areas  of  the  different  com- 
ponent curves  by  means  of  the  method  of  point  contours. 
Hence  it  is  to  be  expected  that  the  final  adjustments 
will  be  comparatively  small.  This  fact  has,  however,  no 
influence  upon  the  application  of  the  method. 
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their  respective  'N's  we  can  get  a  first  approxima- 
tion of  the  final  death  curve.  I  give  on  page  144  an 
approximate  table  arranged  in  5  year  intervals. 
We  might  now  first  compute  the  various  factors 

/c^,  /c- k^    which  will  be  common  for  all 

observation  equations.  We  have,  referring  to  the 
above  formulas  (llandl2)  for  the  various  k's  (15). 

_  1000000  ^       _  123089  .       _  183045 
°  "  365995  '     ^  ~  365995  '     ^ ""  365995 

_  104888         _  75030  _   69996 


365995       "      365995       "^       365995 
_  61003  ^        _   17002 


(15) 
365995       °       365995 

Or 

A:o=  2,732,  A:^  =  0,336,  A:3  =  0,500,  A:, -^0,287, 

A:^^  0,205,  A:g=  0,191,  A;^^  0,167,  A:3  =  0,046. 

To  illustrate  the  further  process  of  the  compu- 
tation of  the  observation  equations,  let  us  take  a 
certain  age  interval,  say  the  interval  between 
50-54.  The  value  of  ^2  taken  from  the  above  table 
is  163.39.  The  value  of  R.^  (x)  for  this  interval  is 
0.234  (see  table  page  141).  Hence  we  have  the 
following  observation  equation  (16). 

10 
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0.234  =   104.53a3  :  [15.76 a^  +  104.53a3  + 
84.16a^  +  64.52a5  +  73.55  a^  +  3b.01a^  + 
O.OOag  +  (2.732  —  0.336a^  —  0.500a3—  (i6) 
0.287  a,  —  0.205ag  —  0.191  a^—  0,167  ag- 
—  0.046  ttg)  163.39]. 

After  a  few  simple  reductions  this  may  be 
brought  to  the  following  form  : 

9.16a^    +    99.19a3  —   8.72a^  —  1.26a^  —] 
9.91  ttg  —  1-81  a^  +  1.76 ttg  — 104.45  =  0.    J 

In  the  routine  work  I  usually  use  a  system  of 
computing  the  various  equations  which  is  out- 
lined in  detail  in  the  accompanying  tabular  scheme 
referring  to  all  the  groups  in  the  age  interval 
50-54  and  shown  on  pages  148-154. 

Similar  observation  equations  are  arrived  at  in 
exactly  the  same  manner  for  other  groups  and 
other  age  intervals.  For  the  whole  interval  from 
age  20  and  upwards  we  get  in  this  way  96  obser- 
vation equations  from  which  to  determine  the  cor- 
rection factors.  The  coefficients  of  these  obser- 
vational  equations   are   then   written   down,   and 
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their  various  products  formed  in  turn.  We  deem 
it  not  necessary  to  give  all  these  observational 
equations  and  their  coefficients  for  all  the  96 
observations,  but  shall  limit  ourselves  to  give  all 
the  necessary  computations  for  the  interval  from 
60-54  as  previously  considered.  With  the  usual 
system  of  notation  employed  in  the  method  of 
least  squares  we  get  the  scheme  on  pages  148-154. 

Normal  Equations,  Michigan  Males  1909 — 1915. 

723763  400750  218930  150776  135184  115318  30325  1801152 

877847  253187  176242  149858  129697  34600  2053941 

237159    90440    72317    62110  16246    964843 

105346    47022    39939  10576    628608 

76774    28909    8668    525295 

53378    7012    437390 

2391    111625 

The  addition  of  the  various  columns  of  the  sum 
products  of  the  coefficients  gives  us  finally  the 
above  set  of  normal  equations  of  which  we  only 
submit  the  coefficients  in  the  usual  scheme  em- 
ployed in  the  method  of  least  squares. 

Solving  the  above  system  of  normal  equations 
by  means  of  the  well-known  method  devised  by 
Gauss,  we  obtain  finally  the  values  on  page  154  for 
the  varicrus  a's  by  which  the  approximate  values 
'N  must  be  multiplied  in  order  to  yield  the  prob- 
able values  of  N. 
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gg  gh  gs 


60-64 


50-54 


0.0 

0.8 

0.6 

3.2 

188.1 

39.6 

1.4 

88.3 

6.1 

0.8 

47.8 

0.3 

0.8 

46.2 

10.3 

0.3 

16.7 

1.5 

29.2 

1740.4 

69.7 

n:  2391.0 

-  111625.0 

-1807.0 

hh 

hs 





72.3 

45.9 

10920.3 

2299.0 

5416.9 

375.4 

2819.6 

15.9 

2631.7 

584.8 

979.7 

93.9 

103877.3 

4157.7 

Sum:  6630212.0                107358.0 

Correction  Factors,    a. 

Group       I  1.03284 

—  II  1.00017 

—  Ill  1.03635 

—  IV  1.03731 

—  V  1.00956 

—  VI  0.97334 

—  Vila  0.90332 

—  Vllb  0.60565 

—  VIII  1.13743 
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Applying  the  above  correction  factors  to  the 
respective  values  of  'N,  we  get  finally  as  the  total 
areas  of  the  respective  component  curves  : 

Group 


I 

127,131 

II 

366,059 

III 

189,699 

IV 

108,750 

V 

75,747 

VI 

68,130 

Vila 

33,032 

Vllb 

12,133 

7111 

19,339 

1,000,000 

Multiplying  the  equations  of  the  various  frequency 
curves,  F(x),  of  the  percentage  distribution  in 
each  group  with  the  above  values  of  N  v^e  ob- 
tain finally  the  complete  mortality  table  as  will 
be  given  in  the  x\ppendix.  The  final  graphical 
representation  of  the  frequency  curves  is  shown 
in  Figure  2. 

10.  GOODNESS  OF       ^his  Completes  the  third  step 

•^■'^  in  the  inductive  process.    The 

fourth    and    final    step    is    the 

verification  of  the  results  thus  arrived  at  by  a  mere 

deductive  process.    Here  it  must  be  remembered 
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that  the  condition  which  the  final  component  fre- 
quency curves  shall  fulfill  is  the  one  that  observed 
proportionate  death  ratios  shall  agree  as  closely 
as  possible  with  the  expected  or  theoretical  pro- 
portionate death  ratios  as  computed  from  the  final 
table.  In  this  connection  it  must  be  borne  in 
mind  that  the  observed  proportionate  death  ratios 
are  given  in  quinquennial  age  groups.  Thus  the 
observed  proportionate  death  ratios  in  a  certain 
age  interval,  as  for  example  between  50 — 54  are 
really  the  average  or  "central"  proportionate  death 
ratios  at  age  52.  From  the  complete  table  it  is, 
however,  possible  to  compute  the  proportionate 
death  ratios  for  each  specific  age.  Graphically  the 
expected  proportionate  death  ratios  wall  therefore 
represent  a  continuous  curve,  while  the  observed 
ratios  will  be  represented  by  a  rectangular  shaped 
column  diagram.  Such  a  graphical  representation 
is  shown  in  Fig.  3  which  simply  represents  the 
figures  in  Table  C  and  Table  E  in  graphical  form. 
The  "goodness  of  fit"  of  the  "expected"  or  theore- 
tical values  to  the  "actual"  or  observed  values  is 
seen  to  be  very  close,  especially  in  the  largest  and 
most  important  groups.  It  is  only  in  the  combined 
groups  Vila  and  Vllb  that  the  "fit"  might  prob- 
ably be  open  to  criticism  for  higher  ages,  but  even 
here  the  deviation  is  small  between  the  actual  and 
theoretical  values.    A  very  small  increase  in  the 


158 


Human  Death  Curves. 


mocnocnooioa«ocr«oc"ocr<ocri? 


CO    CO    to    h-i    p 


COCnOiCiCOpt.COsDjP'.pOK-'pttO 


t-i)_itOtOtCtOtOI-t|-i|-ih-i|-i  ,_ 

p    Jf^    P    p    50    00    j^    ^    to    J-^    CO    h-i    GO    p    to    h-'      ^ 

C5<£>c£>aorfi^b5f-^rf^bocDtoorf:i.to*^o    •— ' 


p^poptOrf».cnppp^p5tO(-ip;<iCn 

Cib^bobscof-'^ascoH-'bobolocDOc:' 


f_ioboh-^tOCDlf^b^f-AoJ|i'H-i*^f--'botO 


h-ih-iCOtOCOlOtOI-kl-i  <^ 

ppK-'coooppph-icnppto-aj-'-<i    ^ 

*»-bo05tobotOCOC£>CD^^If>>b5COCOC» 


I-'  1-^  to  OC  00  t-i      <;;J 

p  p  p  p  J-*  to  w  CO  -a  h-i  p  00  p  00  ;0  pi  >_! 
iococr«cDb^boii'Oboob^OH-«'0-<ico  •"• 


p  p 


p  pi 

CO 


H-i  CD  00  O   t— t 


St. 

W 

tr 

X 

CD 

'g 

^ 

0 

i=< 

<n- 

0 

CD 

P^ 

Q 

CD 

0 

Cu 

•-< 

03 

er*- 

^ 

tr 

OQ 

CD 

«rt- 

0 

CD 

►=< 

3 

CD 
el- 

n' 

0 

E- 

i-t» 

t-»5 

>-< 

CD 

hj 

i^ 

s 

c 

^ 

CD 

0 

P 

•-< 

0 
^ 

1  » 

0 

0       <y 

CD 

<rK         C^ 
CD 

OQ 

u  • 

0 

CD 
P 

§ 

P- 

0 

1?:^ 

p 

ctq' 

«rt- 

P3 

p 

OQ 

§ 

P 

g 

OQ 

^ 

0 

GQ 

0 

1— I 

B 

i 

1 

CD 

1 

O. 

h-i 

CD 

!-♦» 

H- ' 

^ 

0^ 

0 

Goodness  of  Fit.  159 

area  of  the  Vllb  curve  would  easily  adjust  this 
difference.  It  is,  however,  doubtful  if  such  a  cor- 
rection or  adjustment  would  have  any  noteworthy 
effect  upon  the  ultimate  mortality  rates  q^,  and  I 
do  not  consider  it  worth  while  to  go  to  the  addi- 
tional trouble  of  recomputing  the  areas,  especially 
in  view  of  the  fact  that  the  observation  data  above 
the  age  of  80  are  not  exact  and  detailed  enough  to 
be  used  in  this  method  of  curve  fitting.  For  ages 
up  to  70  or  75  I  consider,  however,  the  table  as 
thus  constructed  as  sufficiently  accurate  for  all 
practical  purposes. 

11.  MASSACHUSETTS  ^^  another  example  of  the  me- 
1914^^917  ^^^^  ^  *^^®  *^^  construction 
of  a  mortality  table  for  the 
State  of  Massachusetts  from  the  mortuary  records 
for  the  three  years  1914,  1915  and  1916.  The 
records  as  given  by  the  Kegistration  reports  are 
better  than  the  records  for  Michigan,  in  as  much 
as  they  have  avoided  the  deplorable  practice  of 
grouping  all  deaths  above  the  age  of  80  into  a 
single  age  group.  On  the  other  hand,  the  classifi- 
cations of  cause  of  death  in  Massachusetts  by  at- 
tained age  are  given  in  ten  year  age  groups  only. 
Hence  it  is  readily  seen  that  we  will  only  be  able 
to  secure  half  as  many  observation  equations  as 
in  the  case  of  the  five  year  interval  in  Michigan. 
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This  rather  large  grouping  puts  the  method  to  a 
severe  test.    In  spite  of  this  drawback  I  shall  for 
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the  benefit  of  the  readers  briefly  outline  the  results 
I  have  obtained  from  an  analysis  of  the  Massachu- 
setts data. 

While  for  the  Michigan  data  I  employed  a  sy- 
stem of  frequency  curves  previously  used  with 
success  for  certain  Scandinavian  data,  I  found  it 
was  easier  to  fit  the  Massachusetts  data  to  a  sy- 
stem of  frequency  curves  used  in  the  construction 
of  a  mortality  table  for  England  and  Wales  for 
the  years  1911  and  1912  from  the  mortuary  records 
of  deaths  by  age  and  cause  among  male  lives.  The 
classification  by  age  of  the  causes  of  death  in  8 
groups  is  also  different  from  that  of  Michigan, 
especially  for  middle  life  and  younger  ages.  The 
parameters  of  the  system  of  component  frequency 
curves  to  which  I  fitted  the  Massachusetts  data  are 
shown  in  the  following  table  F : 

Table  F. 

Parameters  of  the  System  of  Frequency  Curves 

for  Massachusetts  Males  1914 — 1916. 


Group 

Mean 

Dispersion 

Skewness    Excess 

I 

78.70  years 

7,9775  years 

+  .0920      +  .0331 

II 

68.00     - 

12,2051     - 

+  .1151       +  .0234 

III 

63.05     - 

13,0532     - 

+  .1210      +  .0471 

IV 

60.45     - 

17,8562     - 

+  .0983      -  .0091 

V 

49.60     - 

18,6100     - 

+  .0328      -  .0309 

VI 

43.80     - 

14,6760     - 

-  .0091       -  .0272 

Vllb 

57.40     - 

12,1550     - 

+  .0021       -  .0026 

Vila  and  Villa  constructed  from  Poisson-Charlier  Curves. 

11 
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The  observed  number  of  deaths  according  to  the 
8  groups  of  causes  of  death,  and  their  correspond- 
ing proportionate  death  ratios  are  given  in  the  fol- 
lowing tables  G  and  H. 

By  finding  first  approximate  values  and  then  by 
a  further  correction  of  these  approximation  areas 
by  means  of  the  factors  a.  determined  by  the 
method  of  least  squares  in  exactly  the  same  man- 
ner as  demonstrated  in  the  case  of  Michigan,  we 
finally  arrive  at  the  following  areas  of  the  various 
groups. 

Areas  of  the  component  frequency  curves  in  the 
Life  Table  for  Massachusetts  Males,  1914 — 1916. 


Areas 

Grroup    I 

90064 

—    II 

281470 

—    Ill 

207854 

—    IV 

151316 

—    V 

99543 

—    VI 

107718 

Vila  &  Vllb 

40719 

—  Villa 

21316 

1000000 

Forming  the  products  N  F  (x)  for  the  various 
groups  and  integral  ages  we  obtain  finally  the 
life  table  as   shown   in   the   appendix.    In  order 
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to  test  the  "goodness  of  fit"  of  the  curves  it  is 
necessary  to  compute  the  expected  or  theoretical 
proportional  death  ratios  from  this  latter  table  and 
compare  such  ratios  with  the  observed  or  actual 
proportionate  death  ratios  as  shown  in  Table  H. 
The  theoretical  values  are  shown  in  Table  I,  and 
a  graphical  representation  illustrating  the  "good- 
ness of  fit"  between  the  observed  and  theoretical 
ratios  is  given  in  Fig.  5.  I  think  it  will  be  generally 
admitted  that  the  fit  is  satisfactory  for  all  practical 
purposes. 

The  State  of  Massachusetts  has  always  been  the 
foremost  state  in  the  union  for  reliable  and  trust- 
worthy statistical  records,  and  in  all  probability  it 
would  be  possible  to  secure  the  deaths  by  causes  in 
5-year  age  groups  instead  of  ten-year  groups.  By 
taking  the  above  table  as  a  first  approximation  one 
should  then  obtain  a  very  accurate  table.  On  the 
other  hand,  it  is  possible  to  verify  the  final  results 
in  the  above  Life  Table  for  Massachusetts  by  an 
entirely  different  process.  It  happens  that  the 
State  of  Massachusetts  took  a  census  in  April  1915. 
This  census  for  living  males  by  attained  ages  could 
then  be  used  as  an  approximation  for  the  exposed 
to  risk,  while  the  deaths  for  the  three  years  could 
be  used  as  a  basis  for  the  number  of  deaths  in  a 
single  year.  A  Life  Table  could  then  be  con- 
structed by  means  of  the  orthodox  methods  usually 
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employed  by  actuaries  and  statisticians  in  the  con- 
struction of  mortality  tables  from  census  returns. 


^^*  cSmotive  EN-  ^^s  a  third  illustration,  I  shall 
TABLfi^fla— 17  construct  a  table  for  American 
OTHER  TABLES  Locomotivc  Engineers  for  the 
period  1913—1917.  The  statistical  data  forming 
the  basic  table  are  the  mortuary  records  by  at- 
tained age  and  cause  of  death  among  the  members 
of  The  Locomotive  Engineers'  Life  and  Accident 
Insurance  Association,  a  large  fraternal  order  of 
the  American  Locomotive  Engineers.  The  total 
number  of  deaths  in  the  five  year  period  amounted 
to  more  than  4,000.  Distributed  into  separate 
groups  of  causes  of  death,  it  was  found  that  it 
vvas  possible  to  use  a  system  of  frequency  curves 
similar  to  that  employed  in  the  State  of  Massachu- 
setts, except  for  Group  No.  IV,  for  which  it  was 
found  exceedingly  difficult  to  find  a  single  curve 
which  would  fit  the  data,  and  much  points  towards 
the  actual  presence  of  a  compound  curve  of  that 
group  of  causes  of  death  among  the  Locomotive 
Engineers.  The  grouping  of  causes  of  death  is,  also 
slightly,  different  from  that  of  Michigan  and  Mas- 
sachusetts. I  shall  not  go  into  further  details  as 
to  the  actual  construction  of  this  table,  except  to 
mention  the  areas  of  the  various  component  fre- 
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quency  curves  of  which  I  present  the  following 
table. 

Areas 
Group      I   44,857 

—  II    342,645 

—  Ill    226,022 

—  IV    147,420 

—  V    47,650 

—  VI    31,260 

—  Vila  79,005 

—  Vllb  77,713 

—  VIII    3,428 


1,000,000 


It  must  also  be  remembered  that  the  radix  of 
this  table  is  taken  at  age  20,  instead  of  at  age  10 
as  is  the  case  in  the  preceding  tables.  The  final 
graph  is  shown  on  the  preceding  page.  A  num- 
ber of  diagrams  illustrating  the  "goodness  of 
fit"  are  also  attached  and  need  no  further  com- 
ment. It  might,  however,  be  of  interest  to  men- 
tion the  fact  that  the  American  actuary,  Moir, 
has  recently  constructed  a  mortality  table  for 
American  Locomotive  Engineers  along  the  ortho- 
dox lines  from  the  data  contained  in  the  Medico- 
Actuarial  Mortality  investigation.  Moir's  table  -- 
or  at  least  the  great  bulk  of  the  material  from 
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which  it  was  derived  —  falls  in  the  interval  be- 
tween 1900  and  1913.  Owing  to  the  energetic 
** safety  first"  movement  which  since  1912  has  been 
actively  pursued  by  most  of  the  leading  American 
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railroads,  it  is,  however,  to  be  expected  that  the 
period  1913 — 1917  indicates  a  reduced  mortality  as 
compared  with  that  of  Moir's  period.  This  fact 
is  also  shown  in  the  diagrams  in  Fig.  7.^  On  the 
other  hand,  the  almost  parallel  movements  of 
Moir's  table  with  that  of  the  table  of  the  fre- 
quency curve  method  of  1913 — 1917,  seems  to 
indicate  the  soundness  of  the  proposed  method. 


*  Curves  I,  II  and  V  are  Locomotive  Engineers'  Mor- 
tality Tables  for  various  periods. 
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1^  a.  ADDITIONAL         ^     ^™"*''     ^^^^^     ^'"'^'"g     "'°''- 

^^-PT^T^JI^       tality    conditions    cimong    a    de- 

TABLES 

cidedly  industrial  or  occupational 
group  has  been  constructed  for  coal  miners  in  the 
United  States.  The  original  data  of  the  deaths  by 
ages  and  specific  causes  were  obtained  from  the 
records  of  several  fraternal  orders  and  a  large  indus- 
trial life  assurance  company  and  comprised  nearly 
1600  deaths.  The  number  of  deaths  above  the  age  of 
sixty  were,  however,  too  few  in  number  to  determine 
with  any  degree  of  exactitude  the  area  of  component 
curves  for  the  older  age  groups.  For  ages  below 
sixty-five  the  table  should  on  the  other  hand  give  a 
true  representation  of  the  mortality  among  coal 
miners  in  American  collieries  during  the  period  under 
consideration^).  A  particular  feature  of  this  table  is 
the  comparatively  low  mortality  in  group  VI,  which 
contains  primarily  deaths  from  tuberculosis.  Coal 
miners  present  in  this  respect  different  conditions 
than  those  usually  prevailing  in  dusty  trades  where 
the  death  rate  from  tuberculosis  is  unusually  high. 
The  same  feature  is  also  borne  out  in  previous  in- 
vestigations on  the  death  rate  of  coal  miners  in  Eng- 


*  It  was  not  possible  to  seperate  anthracite  and  bituminous  coal  miners. 
The  data  indicate,  that  anthracite  mine  workers  have  a  higher  accident 
rate  than  workers  in  bituminous  mines. 
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land,  and  by  the  recent  investigations  by  Mr.  F.  L. 
Hoffman  on  dusty  trades  in  America. 

In  order  to  have  a  measure  of  the  mortality  pre- 
vailing among  industrial  workers  in  Americsi,  we 
submit  a  table  derived  from  a  very  detailed  collection 
of  mortuary  records  by  age,  sex  and  cause  of  death 
as  published  by  the  Metropolitan  Life  Insurance  Com- 
pany of  New  York.  A  deplorable  defect  in  this  splen- 
did collection  of  data  is  the  grouping  together  of  all 
ages  above  seventy  in  a  single  age  group,  which 
makes  it  almost  impossible  to  determine  the  com- 
ponent curves  for  higher  ages  with  any  degree  of 
trustworthiness. 

The  defect  in  the  original  Metropolitan  data  for 
older  age  groups  made  it  neccessary  to  modify  the 
earlier  sets  or  famiUes  of  curves  which  were  used 
on  the  Michigan  and  Massachusetts  data  and  to 
combine  several  of  the  subsidiary  component  curves, 
especially  those  for  the  older  age  groups.  Such 
modifications  were,  however,  easily  performed  by 
means  of  simple  logarithmic  transformations. 

I  give  below  my  grouping  scheme  for  the  Metro- 
politan data  designated  by  the  code  numbers  of  the 
international  list  of  causes  of  death.  The  actual 
cause  of  death  corresponding  to  each  code  number 
is  found  under  paragraph  8  of  the  present  chapter. 
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GROUP  I 
10,  39  to  46,  48,  50,  54,  63  b,  64  to  66,  68,   79,  81, 
82,  89  to  91,  94,  96,  97,  103,  105,  109  a,  120,  123,  124, 
126,  127,  142,  154. 

GROUP  II 
4,  13,  14,  18,  26,  27,  32  to  35,  47  (over  age  20),  49, 
51  to  53,  55,  60,  62,  70  to  72,  77,  78,  80,  83  to  88,  92, 
95,  98  to  102,  106,  107,  109  b,  110  to  119,  122,  125,  143 
to  145,  148,  149,  155  to  163. 

GROUP  III 
28,  29,  31,  37,  38,  56  to  59,  67. 

GROUP  IV  a  AND  IV  b 
1,  5  to  9,  17,  19,  20  to  25,  30,  61,  63  a,  73  to  76,  108, 
146,  147,  150,  164  to  186,  47  (under  age  20). 

It  will  be  noted  that  under  this  scheme  Group  I 
includes  practically  Groups  I  to  III  of  the  Michigan 
classification.  Group  II  corresponds  partly  to  IV  and 
V  for  Michigan,  Group  III  is  practically  Michigan's 
Group  VI,  while  Group  IV  a  and  IV  b  takes  in  partly 
V,  VII,  and  VIII  in  the  Michigan  experience.  As  a 
further  correction  I  found  it  also  advisable  to  transfer 
some  of  the  deaths  in  the  age  intervals  10—14,  15 — 19, 
20—24,  and  25—29  in  Groups  I  and  II  to  Group  IV  a 
so  as  to  avoid  the  long  left  tail  ends  in  these  older 
age  curves. 
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After  grouping  the  deaths  (more  than  200,000)  of 
the  Metropohtan  experience  according  to  the  sibove 
scheme,  it  is  a  simple  matter  to  compute  the  various 
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Fig.  9. 

values  of  R(x)  of  the  four  groups  for  quinquennial 
age  intervals  and  use  these  values  (altogether  52  in 
number)  for  finding  the  observation  equations  and  in 
the  subsequent  determination  of  the  component  curves 
as  shown  in  the  Rnsl  mortaUty  table  in  the  appendix 
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to  this  chapter.  A  comparison  between  the  observed 
values  of  R(x)  by  quinquennial  ages  and  the  con- 
tinuous values  of  R(x)  (indicated  by  dotted  curves) 
as  computed  from  the  final  mortality  table  is  shown 
in  Fig.  9.  The  "fit"  between  calculated  and  observed 
values  is  evidently  satisfactor>\ 

A  most  instructive  and  unique  experience  is  of- 
fered in  the  table  of  Japanese  Assured  Males  for  the 
four  year  period  1914-1917  and  based  upon  the  death 
records  of  more  thgui  a  dozen  of  the  leading  Japanese 
Life  Assurance  Companies.  About  35,000  deaths  by 
cause  and  arranged  in  quinquennial  age  groups  were 
available  for  this  construction.  The  component  curves 
for  the  older  age  groups  were  determined  by  a  simple 
logarithmic  transformation  of  the  variates  and  offered 
no  particular  obstacles  in  the  a  priori  determination 
of  the  parameters.  The  curves  for  middle  and  younger 
hfe  were  more  difficult  to  handle,  especially  the 
curves  typical  of  tuberculosis,  spinal  meningitis  and 
the  pecuhar  Oriental  disease  known  as  Kakke,  aris- 
ing from  an  excessive  rice  diet.  A  first  attempt  to 
use  the  same  curve  types  as  employed  in  some  of  the 
European  and  American  data  did  result  in  a  very 
poor  fit  between  the  observed  and  calculated  values 
of  R{x)  for  the  younger  age  intervals  clearly  indica- 
ting that  the  clustering  tendencies  were  different  in 
the  case  of  the  Japanese  data  than  in  the  other  experi- 
ences I  had  previously  dealt  with. 

12 
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The  pecuhar  form  of  the  observed  values  of  R{x) 
for  the  tuberculosis  group  indicated  beyond  doubt 
that  the  frequency  curve  for  this  group  itself  was  a 
compound  curve.  I  therefore  decided  to  include  both 
spinal  meningitis  and  kaJcke  v^ith  the  tuberculosis 
group,  and  treat  this  new  group  as  a  compound  fre- 
quency curve  with  two  components.  By  successive 
trials  I  finally  succeeded  in  estabhshing  a  complete 
curve  system  which  satisfied  the  ultimate  require- 
ment of  the  fit  between  the  observed  and  calculated 
values  of  R{x)  for  the  various  groups,^ 

Grouping  of  Causes  of  Death  in  Japanese  Assured 
Males  1914—1917. 
GROUP  I 
Diseases  of  Arteries,   Senility,  Influenza,  Cerebral 
Hemorrhage,  Acute  and  Chronic  Bronchitis,  Broncho- 
pneumonia. 

GROUP  n 
Asthma  and  Pulmonary  Emphysema,  Cancer  (all 
forms).  Tumor,  Diabetes,  Other  Diseases  of  Body, 
Paralytic  Dementia,  Taljes  Dorsalis,  Diseases  of  other 
organs  for  circulation  of  Blood,  Chronic  Nephritis, 
Other  Diseases  of  Urinary  Organs. 

GROUP  HI 
Mental    Diseases,    Other    diseases    of    Spine    and 
MeduUa     Oblongata,     Other     Diseases     of     Nervous 


^  See  Addenda  for  the  final  table. 
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System,  Disesises  of  Cardiac  Vedves,  Pneumonia, 
Pleurisy,  Other  Respiratory-  Diseases,  Gastric  Catarrh, 
Ulcer  of  Stomach,  Hernia,  Other  Diseases  of  Stomach, 
Diseases  of  Liver,  Acute  Nephritis,  Diseases  of  Skin 
and  Diseases  of  Motor  Organs. 

GROUP  IV  a  AND  IV  b 

Typhoid  Fever,  Malaria,  Cholera,  Acute  Infectious 
Diseases,  Peritonitis,  Suicide,  Dysentery,  Tuberculosis 
(all  forms),  Syphilis,  Kakke,  Menengitis,  Inflamma- 
tion of  the  Caesum,  Death  by  external  causes  (acci- 
dents, etc.). 

Arranging  the  collected  Japanese  statistics  on 
causes  of  death  among  assured  males  by  attained 
age  at  death  in  accordance  with  the  above  scheme 
of  grouping,  using  a  5  year  interval  as  the  unit,  we 
obtain  the  following  double  entry  table  for  the  35207 
deaths  as  used  in  my  computation  for  the  various 
values  ofR{x). 

Ages  Group  I       Group  II     Group  III       Group  IV  Total 


10—14 

3 

4 

37 

79 

123 

15-19 

17 

23 

216 

714 

970 

20—24 

37 

65 

181 

1640 

1923 

25—29 

62 

109 

324 

1975 

2470 

30-34 

124 

257 

800 

1993 

3174 

35—39 

278 

480 

1147 

2065 

3970 

40—44 

449 

662 

1299 

1674 

4084 

45—49 

701 

957 

1352 

1482 

4491 

50—54 

742 

959 

1115 

990 

3806 

12* 
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Ages 

Group  I 

Group  II 

Group  III 

Group  IV 

Total 

55—59 

864 

1045 

1041 

728 

3678 

60—64 

865 

847 

874 

482 

3068 

65-^9 

626 

571 

612 

186 

1995 

70—74 

399 

268 

347 

80 

1094 

75—79 

123 

76 

100 

20 

319 

80-84 

16 

13 

10 

3 

42 

The  observed  values  of  R(x)  as  derived  from  the 
above  table  are  shown  in  the  staircase  shaped  histo- 
graph  in  Fig.  10.  The  correlated  values  of  R(x)  as 
calculated  from  the  final  mortality  table  are  shown 
as  dotted  curves  on  the  same  diagram.  The  "fit" 
between  observed  and  calculated  values  of  R(x)  is 
evidently  satisfactory  except  for  the  youngest  age 
intervals. 

The  construction  of  the  present  Japanese  table  con- 
stitutes probaJ)ly  the  most  severe  trial  to  which  the 
proposed  method  ha^  hitherto  been  put.  We  are  here 
dealing  with  an  entirely  different  race  living  under 
different  economic  conditions  than  the  nations  of 
Europe  and  America  and  afflicted  with  certain  forms 
of  diseases  which  are  comparatively  rare  or  unknown 
among  the  Western  nations. 

It  is  therefore  gratifying  to  note  that  the  eminent 
Japanese  actuary,  Mr.  T.  Yano,  in  comparing  the 
above  mentioned  table  with  an  investigation  he  made 
on  the  aggregate  mortality  in  1913-1917  of  all  the 
Japanese  life  assurance  companies  (about  45  in  num- 
ber) from  the  actual  number  of  lives  exposed  to  risk 


Japanese  Life  Table. 


181 


'ooAr~>, 


do 


Co 


4s 


EO 


QaooPt) 


lyA^ilVB 


^v 


Qaou;? 


N, 


^gomy 


qgoyi^  y 


m 


2o  ^>o  "^o  5e 

Fig.  10. 


6£  "JO      Ao^eot   Pea^V). 


182  HumsLn  Death  Curves. 

at  various  ages  has  been  able  to  test  independently 
the  validity  of  the  proposed  method  to  complete 
satisfaction.  (See  remarks  in  preface). 

13.  CRITICISMS  AND  ^^^^  *hese  remarks  I  shall 
SUMMARY  (^iQgg  tjjg  mere  technical  dis- 
cussion of  the  proposed  method 
and  turn  my  attention  to  the  arguments  advanced 
by  certain  American  critics  against  the  possibility 
of  constructing  mortality  tables  from  records  of 
death  alone.  I  deem  no  apology  necessary  to  meet 
those  critics  and  give  a  brief  historical  sketch  of 
the  origin  of  the  proposed  method,  because  re- 
marks along  this  line  will  tend  to  accentuate  the 
difficulties  the  mathematically  trained  biometrician 
has  to  contend  with  in  obtaining  a  hearing  among 
the  present  day  school  of  actuaries  and  stati- 
sticians. 

A  good  many  critics,  among  whom  I  may  men- 
tion Mr.  John  S.  Thompson  and  Mr.  J.  P.  Little, 
apparently  have  received  an  erroneous  impression 
of  the  fundamental  processes  of  the  proposed  me- 
thod and  its  evident  departure  from  the  conven- 
tional methods.  Mr.  Thompson  states  "If  we  un- 
derstand the  process,  the  result  is  simply  a  gradua- 
tion of  "d  "  the  "actual"  deaths,  and  it  is  not 
apparent  why  a  mortality  table  should  not  be 
formed  from  the  unadjusted  deaths  and  some  other 
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function  of  graduation  with  equally  good  re- 
sults" ^  From  this  it  would  appear  that  Mr. 
Thompson  is  of  the  opinion  that  I  have  graduated 
the  deaths  as  actually  observed.  As  any  one  who 
will  take  the  trouble  to  read  the  above  article  can 
see  this  is  not  the  case.  The  actually  observed 
numbers  of  deaths  have  only  been  used  to  con- 
struct the  observed  proportionate  death  ratios^. 

The  whole  process  may  be  summarized  as  fol- 
lows : 

1)  The  choice  (a  priori)  of  a  system  of  fre- 
quency curves  based  upon  the  hypothesis  that  the 
distribution  of  deaths  according  to  age  from  typi- 
cal causes  of  death  can  be  made  to  conform  to 
those  postulated  frequency  curves  whose  para- 
meters are  known  or  chosen  beforehand. 

2)  The  grouping  of  causes  of  death  so  as  to 
conform  with  the  above  mentioned  system  of  fre- 
quency curves. 

3)  The  computation  for  each  age  or  age  group 
of  the  proportionate  death  ratios  of  such  groups 


^  Proceedings  of  the  Casualty  Actuarial  Statistical 
Society  of  America,  Vol.  IV,  Pages  399—400. 

^  These  objections  by  Thompson  and  Little  are  shown 
in  their  full  obscurity  in  the  case  of  the  tables  for  Lo- 
comotive Engineers,  Coal  Miners  and  Japanese  Assured 
Males  where  the  greatest  number  of  observed  deaths  fell 
between  ages  36 — 49. 
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from  the  collected  statistical  data  of  deaths  by  age 
and  by  cause  of  death. 

4)  The  choice  of  approximate  values  of  the 
areas  of  the  various  component  frequency  curves. 
Such  approximate  values  can  be  determined  by 
inspection  or  by  simple  linear  correlation  methods. 

5)  The  determination  by  means  of  the  theory 
of  least  squares  of  the  various  correction  factors  a 
with  v^hich  the  approximate  values  of  the  areas 
must  be  multiplied  in  order  that  we  may  obtain 
the  probable  values  of  the  areas  of  the  component 
curves.  The  observation  equations  necessary  for 
this  computation  are  obtained  from  the  observed 
proportionate  death  ratios,  which  are  indepen- 
dent of  the  exposed  to  risk. 

6)  The  subsequent  calculation  of  the  products 
NF(x)  for  all  groups  and  for  all  integral  ages. 
This  gives  us  again  the  total  number  dying  from 
all  causes  at  integral  ages  among  the  original 
cohort  of  1,000,000  entrants  at  age  10.  In  other 
words  the  dx  column  from  which  the  final  morta- 
lity table  can  be  constructed. 

7)  The  computation  of  the  "expected"  or 
theoretical  proportionate  death  ratios  from  the 
final  table  and  their  subsequent  comparison  with 
the  "actual"  or  observed  proportionate  death  ra- 
tios to  illustrate  the  "goodness  of  fit". 

It  is  this  last  step  which  constitutes  the  verifica- 
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tion  of  the  results  derived  by  means  of  a  purely 
deductive  or  mathematical  process,  and  is  a  test 
of  very  stringent  requirements.  It  is  namely  re- 
quired that  there  must  be  a  simultaneous  "fit", 
not  alone  for  all  groups  of  causes  of  death,  but 
for  all  age  intervals  as  well. 

The  sole  justification  of  the  proposed  method 
hinges  indeed  upon  the  validity  of  the  hypothesis. 
Is  it  indeed  possible  to  choose  a  priori  a  system 
of  frequency  curves  to  which  to  fit  our  observed 
data?  Theoretically  speaking  each  population  or 
sample  population,  as  for  instance  certain  occupa- 
tional groups  such  as  locomotive  engineers,  far- 
mers, textile  workers,  miners,  etc.  will  in  all  pro- 
bability have  its  own  particular  system  of  fre- 
quency curves.  From  a  purely  practical  point  of 
view  —  and  this  is  the  one  in  which  we  are  chiefly 
interested  —  we  may,  however,  easily  get  along 
with  a  limited  system  af  frequency  curves  for  the 
various  groups  of  causes  of  death  and  limit  our- 
selves to  a  comparatively  few  sets  of  frequency 
curves  to  which  to  fit  our  statistical  data.  The 
case  is  analogous  to  that  confronting  a  manufac- 
turer of  shoes.  Undoubtedly  the  foot  of  one  indi- 
vidual is  different  in  form  from  that  of  any  other 
individual,  and  in  order  to  get  an  absolutely  fault- 
lessly fitting  boot  we  would  all  have  to  go  to  a 
custom  boot  maker.    Practical  experience  shows, 
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however,  that  it  is  possible  to  manufacture  a  few 
sizes  of  boots,  say  6's,  7's,  8's  and  intermediate 
sizes  in  quarters  and  halfs,  so  as  to  fit  to  com- 
plete satisfaction  the  footwear  of  millions  of 
people.  Exactly  in  the  same  manner  I  have  found 
from  a  long  and  varied  experience  in  practical 
curve  fitting  that  it  is  possible  to  fit  the  mortuary 
records  of  male  deaths  by  attained  age  and  cause 
of  death  to  a  comparatively  limited  number  of  sets 
of  component  curves,  say  not  more  than  5  or  6 
sets.  Moreover,  if  in  a  certain  sample  population 
a  certain  curve  should  not  exhibit  a  satisfactory 
fit  it  is  indeed  a  simple  matter  to  change  its  para- 
meters so  as  to  improve  the  fit. 

14     ADDITIONAL      "'"^  regard  to  the  classification 
pmNciPLES^oF     o^  *h^  causes  of  death  into  a 
METHOD  limited   number   of   groups    it 

seems  that  some  of  the  critics  of  the  method  are 
of  the  opinion  that  this  classification  is  ironclad 
and  fixed.  This,  however,  is  not  the  case.  While 
in  a  specific  sample  population  a  certain  cause  of 
death  might  fall  in  group  II,  it  is  quite  likely 
that  the  same  cause  of  death  would  come  under 
another  group  in  another  sample  population.  For 
instance,  the  deaths  from  asthma  are  in  Michigan 
grouped  under  Group  II.  In  the  case  of  Coal 
Miners  such  deaths  would,  however,  go  into  group 
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IV  or  group  V.  If  the  classification  of  causes  of 
death  were  fixed,  the  frequency  curves  for  separate 
population  would  show  great  variations,  and  it 
would  be  out  of  the  question  to  limit  ourselves  to 
a  small  set  of  systems  of  component  curves.  Mak- 
ing the  classification  flexible,  we  are,  on  the  other 
hand,  in  a  better  position  to  proceed  with  a  fewer 
number  of  curves.  For  instance,  in  order  to  use 
the  postulated  frequency  curve  for  Group  VI  for 
Michigan  if  was  necessary  to  place  the  cause  of 
death  listed  as  No.  186  (other  accidental  trau- 
matism) of  the  International  Classification  of 
Causes  of  Death  in  that  group  instead  of  in  group 

V  or  VII,  where  most  deaths  of  this  type  are  or- 
dinarily classed. 

It  would  be  interesting  to  see  to  what  extent 
the  proposed  classification  and  the  chosen  system 
of  frequency  curves  in  Michigan  deviates  from 
the  theoretically  exact  system  of  frequency  curves. 
In  the  case  of  Michigan  it  would  be  impossible  to 
test  this.  An  approximate  test  might  be  obtained 
from  the  Michigan  mortality  data  for  the  three 
year  period  1909 — 1911.  Professor  Glover  has  con- 
structed a  mortality  table  for  males  in  the  State 
of  Michigan  in  this  three-year  period  by  means 
of  the  usual  methods  employed  by  actuaries  by 
resorting  to  the  exposed  to  risk.  Starting  with  a 
radix  af  1,000,000  at  age  10  it  is  possible  to  break 
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up  the  deaths  or  the  dx  column  of  the  Glover 
table  into  a  set  of  subsidiary  columns  of  death 
from  groups  of  causes  of  death  in  the  same  order 
as  given  in  Table  A  on  page  133  by  means  of  a 
simple  application  of  the  observed  proportionate 
mortality  ratios  as  derived  from  the  1909 — 1911 
period.  On  the  basis  of  a  radix  of  1,000,000  sur- 
vivors at  age  10  we  find  that  according  to  the 
Glover  Table,  5016  will  die  in  the  interval  from 
50 — 54.  Let  us  also  suppose  that  the  proportionate 
mortality  ratios  in  group  III  for  ages  50 — 54* 
amounted  to  0.23,  then  the  number  of  deaths  from 
group  III  in  that  particular  interval  in  the  Glover 
table  would  be  5016  x  0.23  =  1154.  Similar  num- 
bers could  be  found  for  the  other  groups  and  for 
arbitrary  age  intervals,  and  we  would  in  this  man- 
ner have  an  empirical  representation  of  the  fre- 
quency curves.  This  aspect  of  the  matter  is  treated 
in  brief  form  on  another  page. 

Keturning  now  to  our  original  discussion,  it  will 
readily  be-  admitted  that  the  method  of  construc- 
ting mortality  tables  by  means  of  compound  fre- 
quency curves  cannot  be  considered  as  absolutely 
rigorous  from  the  standpoint  of  pure  mathematics. 
But  neither  can  the  usual  methods  of  constructing 
mortality  tables  by  graduation  processes  either  by 
analytical  formulas,  mechanical  interpolation  for- 
mulas or  a  simple  graphical  process  be  considered 
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as  mathematically  exact.  All  statistical  methods 
are,  in  fact,  approximation  processes.  In  tho 
greater  part  of  the  realm  of  applied  mathematics 
we  have  to  resort  to  such  approximation  processes. 
It  is  thus  absolutely  impossible  to  solve  correctly 
by  ordinary  algebraic  processes  simple  equations 
of  higher  degree  than  the  fourth.  We  encounter, 
however,  in  every  day  practice  innumerable  in- 
stances in  which  an  approximation  process,  as  for 
instance  Newton's  or  Horner's  methods  or  the 
method  of  finite  differences,  is  sufficiently  close  to 
determine  the  roots  of  any  equation  so  as  to  satisfy 
all  practical  requirements. 

From  this  point  of  view  I  claim  that  the  pro- 
posed method  in  the  hands  of  adequately  trained 
statisticians  will  yield  satisfactory  results,  and  I 
am  inclined  to  think  that  the  results  are  probably 
as  true  as  the  ones  obtained  by  means  of  the  usual 
methods,  which  especially  in  the  case  of  gradua- 
tion by  interpolation  formulas  often  are  affected 
with  serious  systematic  errors.  Moreover,  there 
are  sound  philosophical  and  biological  principles 
underlying  the  proposed  method,  which  is  perhaps 
more  than  can  be  said  about  the  usual  methods, 
purely  empirical  in  scope  and  principle.  On  the 
other  hand,  I  will  readily  admit  that  the  proposed 
method  is  by  no  means  a  simple  rule  of  the  thumb 
and  it  can  under  no  circumstances  be  entrusted  to 
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the  hands  of  amateurs.  The  whole  process  can  in 
my  opinion  only  be  employed  when  placed  in  the 
hands  of  the  adequately  trained  statistician  who  is 
thoroughly  familiar  with  his  mathematical  tools, 
as  provided  in  the  formulas  from  the  probability 
calculus.  Such  adequate  training  is  not  acquired 
over  night,  but  only  through  a  long  and  patient 
study.  Meticulous  and  patient  work  is  often  re- 
quired before  one  is  finally  brought  upon  the  right 
track,  especially  in  the  classification  of  the  causes 
of  death.  Failure  upon  failure  is  oftentimes  en- 
countered by  the  beginner  in  this  work,  and  it  is 
probably  only  through  such  failures  that  the  in- 
vestigator is  enabled  to  avoid  the  pitfalls  of  the 
often  treacherous  facts  as  disclosed  by  statistical 
data  and  steer  a  clear  course.  Mathematical  skill 
is  only  acquired  through  a  long  and  careful  study. 
The  illustrious  saying  of  the  Greek  geometer, 
Euclid,  who  once  told  the  Ptolemaian  emperor 
that  "there  is  no  royal  road  in  mathematics' '  holds 
true  to-day  as  it  did  in  the  days  of  antiquity. 

The  fact  that  the  method  is  no  simple  mechani- 
cal rule,  but  one  which  can  be  entrusted  into  skill- 
ful hands  only,  is,  moreover,  in  my  opinion,  one 
of  its  strong  points,  because  it  eliminates  all  at- 
tempts of  dillet antes  to  make  use  of  it.  A  large 
manufacturing  plant  would  not,  for  instance,  put 
an  ordinary  blacksmith  or  horseshoer  to  work  on 
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making  the  fine  tools  for  certain  parts  of  automa- 
tic machinery  employed  in  the  manufacture  of 
staple  articles.  Only  the  most  skilled  and  highly 
trained  tool  makers  are  able  to  produce  machine 
parts,  which  often  require  precision  measurements 
running  into  one  thousandth  part  of  an  inch.  Nor 
would  a  large  contracting  firm  dream  of  putting 
a  backwoods  carpenter  in  charge  of  the  construc- 
tion of  a  skyscraper.  Yet,  this  case  is  absolutely 
analogous  to  that  of  letting  the  mere  collector  of 
crude  statistical  data  make  an  analysis  and  draw 
conclusions  from  certain  collected  facts  as  ex- 
pressed in  statistical  series  of  various  sorts. 

While  some  American  critics  to  all  appearances 
have  misunderstood  the  principles  underlying  the 
method,  several  European  reviewers  of  the  short 
summary  of  the  method  as  originally  published  in 
the  "Proceedings  of  the  Casualty  Actuarial  and 
Statistical  Society  of  America"  evidently  have  un- 
derstood its  fundamental  principles  completely. 
The  European  critics  seem,  however,  to  be  of  the 
opinion  that  there  is  a  rather  prohibitive  amount 
of  arithmetical  work  involved  in  the  actual  con- 
struction of  the  mortality  table.  Thus  a  review  in 
the  Journal  of  the  Royal  Statistical  Society  for 
May  1918  has  this  to  say : 

"Mr.  Fisher's  object  is  to  construct  a  life 
table,  being  given  only  the  deaths  at  ages  and 
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not  the  population  at  risk.  The  hypothesis 
employed  is  that  the  total  frequency  of  deaths 
can  be  resolved  into  specific  groups  of  deaths, 
the  frequencies  of  which  cluster  around  cer- 
tain ages.  The  parameters  of  these  sub-fre- 
quencies having  been  determined,  the  areas 
are  deduced  from  a  system  of  frequency  cur- 
ves of  the  form : 

where  Rsix) ,  the  proportional  mortality  at 
age  X  of  deaths  due  to  causes  in  group  B  and 
Fb(x)  ,  is  obtained  from  the  equation  of  the 
sub-frequency  curve  for  cause  B ,  while  Nb  + 
Nc+  No-i-  •■•  +  Nk=  1,000,000.  The 
values  of  R(x)  provide  a  system  of  observa- 
tional equations  from  which  (by  least  squares) 
the  values  of  N^,  &c.,  can  be  obtained. 

"Since  particularly  in  industrial  statistics, 
or  in  general  statistical  inquiries  under  war 
conditions  it  is  easier  to  obtain  accurate  data 
of  deaths  at  ages  than  of  exposed  to  risk,  the 
success  of  the  method  is  encouraging.  It  is, 
however,  to  be  noted  that  the  amount  of  arith- 
metical work  envolved  is  considerable.  Quite 
apart  from  the  determination  of  the  para- 
meters of  the  frequency  curves,  the  formation 
and  solution  of  the  normal  equations  needed 
to  compute  the  areas  is  a  heavy  piece  of  work. 
It  would  be  of  interest  to  see  whether  the  re- 
solution into  but  three  components  effected  by 
Professor   Karl   Pearson   in    his   well-known 
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essay  published  in  the  "Chances  of  Death" 
could  be  made  to  describe  with  sufficient  ac- 
curacy an  ordinary  tabulation  of  deaths  from 
age  10  onwards  to  lead  to  approximately  cor- 
rect results  for  life  table  purposes.  The  test 
should,  of  course,  be  made  with  mortality 
data  derived  from  a  population  very  far  from 
being  stationary  and  the  deductions  compared 
with  the  results  of  standard  methods.  The 
subject  is  one  of  peculiar  interest  at  the  pre- 
sent time." 

From  the  above  quotation  it  is  evident  that  this 
English  reviewer  has  a  clear  conception  of  the 
fundamental  principles  upon  which  the  method  is 
based.  His  criticism  is  mainly  directed  against 
the  heavy  piece  of  arithmetical  work  involved. 
This  work  can,  however,  not  be  compared  with 
the  much  more  difficult  task  of  obtaining  the  ex- 
posed to  risk  at  various  ages,  which  under  all  cir- 
cumstances would  take  much  greater  time  and  be 
infinitely  more  costly,  in  fact  be  absolutely  pro- 
hibitive from  a  financial  point  of  view.  I  wish  in 
this  connection  to  state  that  the  whole  arithmeti- 
cal work  involved  in  the  construction  of  the  Michi- 
gan table  was  done  by  two  computers  in  less  than 
70  hours,  while  the  corresponding  table  for  Mas- 
sachusetts took  about  75  hours.  I  do  not  know  if 
this  can  be  called  exactly  prohibitive. 

In  regard  to  the  remarks  of  my  British  critic 

13 
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concerning  the  Pearsonian  method  I  might  add 
that  in  my  first  attempt  of  an  analysis  of  mortality 
conditions  along  the  lines  as  described  above  T 
tried  to  subdivide  the  causes  of  death  into  four 
groups.  It  was,  however,  found  that  this  was  not 
always  sufficient  to  describe  the  frequency  dis- 
tribution of  the  number  of  deaths  around  certain 
ages.  1  doubt  whether  it  is  at  all  possible  to  des- 
cribe the  frequency  distribution  in  the  various  sub- 
groups'by  a  system  of  normal  curves,  which,  of 
course,  would  somewhat  lessen  the  work.  I  have 
made  attempts  to  do  this,  but  so  far  I  have  not 
been  successful  except  in  a  few  cases.  ^  It  might 
be  possible  that  we  should  succeed  in  this  if  we 
first  set  up  a  hypothetically  determined  curve  of 
the  numbers  exposed  to  risk.  Such  a  curve  might, 
for  instance,  be  a  normal  curve.  Personally,  I  be- 
lieve that  little  would  be  gained  by  such  a  proce- 
dure. More  fruitful  appears  an  analysis  by  means 
of  correlation  surfaces.  The  mortality  table  con- 
structed by  the  process  as  I  have  described  it  con- 
stitutes in  its  final  form  a  correlation  surface, 
wherein  the  age  at  death  and  the  group  of  causes 
of  death  are  the  independent  variables,  and  the 
number  of  deaths  at  a  certain  age  and   from  a 


^  See  Addenda   for  the  Metropolitan   Table   and   the 
Japanese  Table. 
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certain  group  af  causes  of  death  is  the  numerical 
value  of  the  correlation  function  of  the  two  va- 
riates.  Provided  one  could  obtain  an  exact  equa- 
tion of  such  a  correlation  surface,  it  would  be  a 
simple  matter  to  construct  a  mortality  table,  and 
I  hope  that  some  statistician  may  in  the  future  be 
induced  to  attempt  a  solution  of  the  problem  in 
this  light. 

15.  ANOTHER  AP-     Beforc  closing  the  discussion  of 

PLICATION  OF         ,  .  ,  .  ,      „     , 

THEFREQUEN-     this  subiect  we  shall,  however, 

CY  CURVE  ME-  .  ,     .    ^     -,  .       .  ^ 

THOD  give  a  brief  description  of  an- 

other application  of  compound  frequency  curves  in 
the  construction  of  mortality  tables.  We  have  here 
reference  to  the  use  of  skew  frequency  curves  in 
the  graduation  of  crude  mortality  rates  as  com- 
puted in  the  usual  empirical  manner  as  the  ratio 
of  deaths  to  the  number  of  lives  exposed  to  risk 
at  various  ages.  On  pa^e  165  it  was  mentioned 
that  the  State  of  Massachusetts  took  a  census  in 
April  1915.  This  census  together  with  the  deaths 
for  the  triennial  period  from  1914 — 1916  makes 
it  an  easy  matter  to  construct  a  mortality  table  in 
the  conventional  manner.  Moreover,  such  a  table 
can  be  compared  with  the  previously  constructed 
table  from  mortuary  records  by  sex,  age  and  cause 
of  death  only  and  shown  in  the  appendix. 

In  this  connection  it  might  be  worth  mention- 
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ing  that  my  first  table  for  Massachusetts  as  con- 
structed by  compound  frequency  curves  v^as  pre- 
pared during-  the  summer  of  1918  and  first  pre^ 
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sented  in  a  series  of  lectures  delivered  at  the 
University  of  Michigan  during  the  month  of 
March  1919,  v^hile  the  final  official  report  of  the 
1915  Massachusetts  census  did  not  come  in  the 
hands  of  the  present  writer  before  May  1919. 
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The  official  census  of  the  population  of  Mas- 
sachuetts  by  sex  and  single  ages  is  given  on  page 
478  in  Vol.  Ill  of  the  Massachusetts  report  from 
which  Fig.  11  has  been  constructed.  It  is  seen 
from  a  mere  glance  of  this  graph  that  there  is  an 
unduly  high  tendency  among  the  figures  to  cluster 
around  ages  being  multiples  of  5.  This  tendency 
is  especially  marked  in  the  age  interval  30 — 60 
and  presents  a  defect  which  is  of  no  small  im- 
portance in  the  construction  of  a  mortality  table 
by  means  of  the  conventional  methods.  It  is  in- 
deed doubtful  if  a  table  constructed  from  data 
so  greatly  influenced  by  observation  errors  and 
misstatements  of  ages  can  be  considered  as  ab- 
solutely trustworthy.  On  the  other  hand  the  data 
ought  to  be  sufficiently  exact  to  test  the  results 
arrived  at  by  the  proposed  method  of  compound 
frequency  curves. 

We  give  below  the  male  population  in  5  year 
age  groups  for  the  middle  census  year  of  1915 
and  the  corresponding  deaths  from  all  causes 
durirg  the  triennial  period  1914 — 1916. 

MASSACHUSETTS 

1915  Male  Population  and  Number  of  Deaths 

among  Males  from  1914 — 1916. 

Ages        Population,  L^.     Deaths  1914—16.  D^. 

5—  9  169010  1715 

10—14  152419  1004 
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Deaths  1914—16.  D^. 
1537 
2353 
2726 
2979 
3535 
4007 
4393 
5026 
5459 
5679 
6027 
5946 
4752 
3166 
1751 
540 
133 
23 

A  few  small  discrepancies  will  be  found  to  exist 
between  this  table  and  the  table  printed  on  page 
163,  giving  the  observed  deaths  from  various 
causes  in  ten  year  age  intervals.  This  arises  solely 
from  the  fact  that  a  number  of  deaths  were  re- 
corded where  the  contributing  cause  was  unknown 
and  could,  therefore,  not  be  distributed  in  their 
proper  groups.  But  this  defect  is  of  no  influence 
in  the  construction  of  mortality  table  by  means 


Ages 
15—19 

Population,  L^. 
154773 

20—24 

171961 

25—29 

171017 

30—34 

149294 

35—39 

142617 

40    44 

125462 

45     49 

107909 

50—54 

89490 

55—59 

65133 

60—64 

49079 

65     69 

34790 

70—74 

23638 

75—79 

13724 

80—84 

6494 

85—89 

2479 

90—94 

530 

95—99 

124 

100  &  over 
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of  the  method  of  compound  frequency  curves,  un- 
less all  the  causes  reported  as  unknown  should 
happen  to  belong  to  the  same  group,  which  hardly 
can  be  assumed  to  be  the  case.  At  any  rate  the 
proportionate  death  ratios  which  are  the  keystone 
in  this  method  of  construction  are  for  practical 
purposes  left  unaltered  whether  we  include  or  ex- 
clude these  few  numbers  of  unknown  causes.  In 
the  usual  way  of  constructing  tables  from  ex- 
posures and  number  of  deaths  it  is  on  the  other 
hand  absolutely  essential  to  include  all  deaths  as 
otherwise  the  death  rate  will  be  underestimated. 
Bearing  these  facts  in  mind  we  therefore  refer 
to  the  above  figures  of  L^  and  Dx  for  Massachu- 
setts Males  from  which  we  without  further  diffi- 
culty can  construct  an  empirical  mortality  table, 
either  by  graphic  methods  or  by  simple  summa- 
tion or  interpolation  formulas.  There  is  indeed  no 
dearth  of  such  formulas,  of  which  a  large  number 
have  been  devised  by  Milne,  Wittstein,  Woolhouse, 
Higham,  Sprague,  Hardy,  King,  Spencer,  Hen- 
derson, Westergaard,  Gram,  Karup  and  several 
other  investigators.  In  the  following  computation 
I  have  used  a  formula  originally  devised  by  the 
Italian  statistician,  Novalis,  and  later  on  some- 
what modified  by  the  English  actuary.  King. 
The  following  schedule  shows  the  actual  process 
in  detail. 
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MASSACHUSETTS  MALES. 

A.    Population. 

Graduated  Quinquennial  Pivotal  Values. 

Graduated 


A^I'x    Age 


Ages     Population  L^  A  L^ 

5—  9  169010  —  16591 

10—14  152419  +    2354  +  18945 

15—19  154773  4-  17188  +  14834 

20—24  171961—     944  —  18132 


Population 


21723  —  20779   27 

6677  +  15047   32 

17155  —  10478      37      28607 

25095 

21587 

17946 

65133  —  16054+    8293      57      12961 


25—29  171017 
30—34  149294 
35—39     142617 

40—44     125462  —  17553—     398      42 
45—49     107909  —  18419—     866      47 
60—54      89490  —  24357—   5938     52 
55—59 
60—64 
65—69 
70—74 
75—79 
80—84 
85—89 
90—94 
95—99 
100—104 


12  29332 
17  30836 
34537 
34369 
29739 


22 


49079  — 14289  + 

1765 

62 

9802 

34790  —  11152  + 

3137 

67 

6933 

23638—  9914  + 

1238 

72 

4717 

13724—  8130  + 

1884 

77 

2731 

6494—  4015  4- 

4115 

82 

1265 

2479—  1949  + 

2066 

87 

480 

530—  406  + 

1543 

92 

104 

124—  112  + 

294 

97 

23 

12 

102 

1 

3d  Population  = 

U^-^7 

=  0.2L^^,^ 

0.008  A 'L 


x+6 
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B.    Deaths  1914—1916. 
Graduated  Quinquennial  Pivotal  Values. 


Ages 

No.  of 
Deaths   D 

X 

^'D, 

Age 

Graduated 
Deaths 

5—  9 

1715  — 

711 

10—14 

1004  + 

533  + 

1244 

12 

200.8 

15—19 

1537  + 

816  + 

283 

17 

307.4 

20—24 

2353  + 

373  — 

443 

22 

470.6 

25—29 

2726  + 

253  — 

120 

27 

545.2 

30—34 

2979  + 

556  + 

303 

32 

595.8 

35—39 

3535  + 

472  — 

84 

37 

707.0 

40—44 

4007  + 

386  — 

86 

42 

801.4 

45—49 

4393  + 

633  + 

247 

47 

878.6 

50—54 

5026  + 

433  — 

200 

52 

1005.2 

55—59 

5459  + 

220  — 

213 

57 

1091.8 

60—64 

5679  + 

348  + 

128 

62 

1125.8 

65—69 

6027  — 

81  — 

429 

67 

1205.4 

70—74 

5946  — 

1194  — 

1113 

72 

1189.2 

75—79 

4752  — 

1586  — 

392 

77 

950.4 

80-^4 

3166  — 

1415  + 

171 

82 

633.2 

85—89 

1751  — 

1211  + 

204 

87 

350.2 

90—94 

540  — 

407  + 

804 

92 

108.0 

95—99 

133  — 

110  + 

297 

97 

26.6 

.00—104 

23 

102 

4.6 

In  this  manner  we  obtain  the  graduated  quin- 
quennial pivotal  values  of  the  population  and  of 
the  deaths  for  ages  12,  17,  22,  27,  ...  .  etc.  Then 
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by  dividing  one  third  of  the  graduated  deaths  by 
the  population  we  have  the  graduated  pivotal 
values  of  the  so-called  "central  death  rates",  or 
rrix  for  quinquennial  ages  from  age  12  and  up. 
From  these  values  of  nix  we  easily  find  the  corre- 
sponding values  of  qx  by  means  of  the  formula : 
2  ma: 

We  give  below  the  results  of  this  computation 
Massachusetts  Males  1914—1916. 


Age 

1000  q^  from  Novalis'  Formula 

12 

2.21 

17 

3.33 

22 

4.64 

27 

5.29 

32 

6.68 

37 

8.25 

42 

10.65 

47 

13.53 

52 

18.67 

57 

26.38 

62 

38.29 

67 

58.12 

72 

81.90 

77 

109.91 

82 

165.02 

87 

240.18 

92 

325.64 
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The  intervening  values  of  q^  are  without  diffi- 
culty derived  by  interpolation  formulas  or  by  a 
graphical  process.  Once  having  all  the  values  of 
qx  for  separate  ages  from  age  10  and  up  it  is  a 
simple  matter  to  form  tables  of  Ix  and  dx  commen- 
cing with  a  radix  of  1,000,000  at  age  10.  Without 
going  into  tedious  details  we  present  the  following 
values  of  Ix  for  decimal  ages. 

Massachusetts  Males  1914 — 1916. 


Age 

/. 

Ages 

^dx 

10 

1,000,000 

10—19 

27,700 

20 

972,300 

20—29 

47,330 

30 

924,970 

30—39 

66,750 

40 

858,220 

40     49 

98,650 

50 

759,570 

50—59 

153,900 

60 

605,670 

60—69 

233,150 

70 

372,520 

70—79 

237,130 

80 

135,390 

80—89 

124,760 

90 

10,640 

90  &  over 

10,640 

100  32 


16.   GRADUATION       It  is  to  this  table  that  we  now 

BY  FREQUENCY     ^^^^^    ^VV^    »   proccss    of   re- 

cuRVES  graduation    by    means    of    the 

method  of  compound  frequency  curves.    Here  we 

have  already  an  empirical  representation  of  the 

total  compound  curve  of  death  or  the  dx  curve. 
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This  compound  curve  can  now  by  simple  and 
straightforward  processes  be  broken  up  into  its 
various  component  parts  as  to  causes  of  deaths  by 
means  of  the  various  observed  proportionate  mor- 
tality ratios,  Rx  shown  in  Table  H  on  page  163. 

Let  us  for  the  sake  of  illustration  take  the  age 
interval  40 — 49.  According  to  our  empirically  con- 
structed table  as  derived  from  the  Massachusetts 
1915  census  we  find  that  the  number  of  deaths 
among  the  survivors  in  this  age  interval  amounts 
to  98,650. 

Applying  to  this  number  the  observed  propor- 
tionate death  ratios,  i?^.,  in  table  H  we  are  able  to 
break  this  number  up  into  its  various  component 
parts  according  to  the  groups  of  causes  of  death 
from  which  the  numerical  values  of  R^  were  de- 
rived. These  component  parts  are  as  follows : 


Group                No.  of  Deaths 

I 

1180 

11 

18050 

III 

17170 

IV 

17170 

V 

14300 

VI 

23970 

VII  a  &  b 

5820 

VIII 

990 

Total : 

98650 

Graduation  of  d.  Column. 


205 


OQOOOOOOOO        C 
E-»  ^    CJ    2^    1-1  I     o 


•         i-H     CO    ''t'    ?C    Ol 
^         ►>     C»   00    »-l    f-i 


Ob 

I 

Oi 


o  o  o  o  o  o 

55    C^    S»J    CM    »-• 


>-     ^   ^1    Cr   o    o   ^ 


o  o  o      o 


_    _    _    _    _   o  o  o 

O35t^O<3i00O0l 

C^^COCiOJ-^rH  CO 

^    Ol    CQ    ^    1-1  o 


oooooooooo 

>•   -«tw»OCOCOOC^^Si-ICC 


<» 

> 

o 

o 

t^ 

o 

^ 

S 

3 

2  S 

c:; 

"M 

X 

'M 

-v-^ 

^ 

r- 

PO    Oi 

^ 

T— 

"^ 

c^ 

O 

■^ 

-^    tH 

■— 

I—! 

C*l 

CO 

CO 

i-i 

5 

CO 

^ 

'— 

— ■ 

o 

o 

o 

o 

o 

<^    <— > 

Q 

^^ 

CO 

^^ 

•^ 

r^ 

<M 

1-1 

CO   "^ 

IS 

r- 

TJ« 

^ 

'^ 

tH 

-* 

<» 

^1   09 

i_4 

CO 

r- 

r- 

OO 

OO 

CO 

C*l      T-t 

tH 

CO 

CO 

lO 

o« 

•0 

o 

r: 

o 

— ) 

r^, 

o 

o  o 

&. 

^_ 

o 

n- 

CO 

lO    OO 

^ 

^^ 

ts 

o 

o 

f^ 

0.1 

o 

1-1  ca 

CO 

X- 

X 

o 

«  -* 

1—1 

^ 

c^ 

35 

-* 

o 

g 

s 

^ 

o 

o  o 

bd 

rH 

!M 

•^ 

»o  »o 

1-1 

O 

OC    CO 

<» 

3^ 

■* 

so 

Si 

o 

«^ 

s 

o 

O 

55    35    s; 
i-»    (M    CO 

^ 

£ 

5 

t^ 

g 

1     1     i 

1 

! 

1 

! 

1 

1 

<   o 

T-l 

Sai 

g 

8 

o 

i 

§ 

206  Human  Death  Curves. 

In  the  same  manner  we  can  break  up  the  com- 
pound curve  (the  dx  curve)  in  its  eight  component 
parts  for  all  other  age  intervals,  which  finally  gives 
us  the  following  table  of  component  groups, 
printed  on  the  preceeding  page,  and  graphically  this 
table  will  represent  a  series  of  frequency  diagrams 
of  the  various  groups  of  causes  of  deaths.  It  is  an 
easy  matter  to  fit  such  diagrams  to  a  system  of 
Laplacean-Charlier  or  Poisson-Charlier  frequency 
curves,  which  symbolically  may  be  represented  as 
follows : 

where  F(x)  is  the  frequency  function  of  the  per- 
centage distribution  according  to  age  of  the  va- 
rious component  groups  or  curves,  while  N  stands 
for  the  areas  of  such  curves. 

These  curve  areas  are  simply  the  sub-totals  of 
the  respective  groups  in  the  above  table.  The  pa- 
rameters giving  the  equations  of  the  curves  Fj  (x), 
Fjj{x),  Fjjj(x),  ....  are  easily  computed  by  the 
methods  of  moments  and  are  shown  in  the  follow- 
ing table  on  page  207. 

Once  having  determined  the  parameters  of  the 
various  frequency  curves  it  is  a  simple  matter  to 
construct  the  final  mortality  table  which  is  shown 
in  the  addenda. 
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Values  of  Parameters  of  Component  Curves, 
Massachusetts,  1914 — 1916  Males^ 


Group 

Mean 

Dispersion 

Skewness 

Excess 

I 

75.0 

9.78 

+  0.080 

—0.005 

II 

67.5 

13.65 

+  0.117 

+  0.017 

III 

64.0 

14.12 

+  0.124 

+  0.030 

IV 

60.5 

16.51 

+  0.089 

—0.006 

V 

50.0 

18.61 

+  0.026 

—0.034 

VI 

43.5 

15.57 

—0.036 

—0.023 

Vllb 

57.5 

16.33 

—0.027 

—0.028 

It  now  remains  for  us  to  compare  the  final  values 
of  qx  which  we  obtain  from  the  three  tables : 
.4)    The  values  of  q^  as  computed  in  the  usual 


^.In  this  grouping  I  have  combined  Vila  and  VIII 
into  a  single  group  and  roughly  fitted  this  group  to  a 
truncated  Poisson-Charlier  curve.  This,  of  course,  is  not 
exact  and  introduces  evidently  errors  in  the  younger 
age  interval  from  10 — 19.  For  ages  above  20  this  curve 
plays  no  importance  and  the  other  curves  should  for 
the  ages  above  20  give  a  satisfactory  fit.  If  absolutely 
exactitude  was  required  for  younger  ages  it  would 
indeed  offer  no  difficulties  to  compute  curves  Vila  and 
VIII  separately  and  thus  obtain  a  much  closer  fit  in 
the  youngest  age  interval.  In  view  of  the  fact  that 
the  present  calculation  is  a  test  case  only,  it  has  not 
been  thought  necessary  to  go  to  these  refinements. 
This  defect  will  af  course  also  effect  to  a  slight  extent 
group  VII  b. 
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way  from  the  number  of  lives  exposed  to  risk  and 
the  corresponding  deaths  at  various  ages. 

B)  The  values  of  qx  as  obtained  by  a  re-gradua- 
tion of  the  mortality  table  under  A  by  means  of 
compound  frequency  curves. 

G)  The  values  oi  Qx  constructed  from  mortuary 
records  by  sex,  age  and  cause  of  death,  but  w^ith- 
out  knov^^ing  the  numbers  of  lives  exposed  to  risk. 

Massachusetts  Males.    1914 — 1916. 
Values  of  1000  q^  by  various  methods. 


Age 

A 

B 

C 

17 

3.33 

3.15 

3.27 

22 

4.64 

3.99 

4.28 

27 

5.29 

5.04 

5.46 

32 

6.68 

6.72 

7.03 

37 

8.25 

8.63 

8.88 

42 

10.65 

10.83 

11.05 

47 

13.53 

13.86 

14.05 

52 

18.67 

18.83 

19.13 

57 

26.38 

26.88 

27.66 

62 

38.29 

38.79 

40.26 

67 

58.12 

59.04 

56.54 

72 

81.90 

76.50 

77.61 

77 

109.91 

103.69 

107.51 

82 

165.02 

137.97 

148.79 

I  think  that   every  unbiased   investigator   will 
admit   that   there   exists   a   close   agreement   be- 
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tween  the  three  series.  It  is  indeed  difficult  to 
say  which  one  of  the  three  is  the  most  probable. 
We  know  that  on  account  of  the  great  perturba- 
tions due  to  misstatements  of  ages  the  values 
under  A  are  effected  with  considerable  errors.  The 
usual  interpolation  or  summation  formulas  do  not 
suffice  to  remove  these  errors  and  tend  often  to 
increase  them.  A  re-graduation  by  means  of  fre- 
quency curves  as  shown  in  series  B  will  in  all 
probability  give  better  results,  although  on  ac- 
count of  the  large  age  interval  (10  years)  in  which 
the  causes  of  deaths  are  grouped  in  the  Massa- 
chusetts reports  this  method  does  not  come  to  its 
full  rights  The  values  of  q^  under  A  and  B  are 
naturally  closely  related  to  each  other,  and  those 
in  series  B  cannot  be  derived  unless  the  values 
in  series  A  are  known  beforehand.  Series  C  on 
the  other  hand  is  independent  of  either  A  or  B, 
having  been  derived  by  means  of  entirely  different 
methods  of  construction. 


27.  COMPARISON       ^  comparisou  between  the  pa- 

^FERENT  M^'     I'aineters  in  the  seperate  com- 

THODs  ponent    curves    in    B    and    G 

gives  us,  however,  a  way  of  testing  -the  validity 

of    the    hypothesis    upon    which    the    method    of 


See  footnote  on  page  127. 
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series  G  rests.  In  the  case  of  the  series  C  we  star- 
ted with  the  hypothesis  of  the  existence  of  a  set 
of  frequency  curves  of  the  percentage  distribution 
of  the  number  of  deaths  according  to  age  among 
the  various  groups.  On  the  basis  of  this  hypothesis 
and  from  the  observed  values  of  the  proportionate 
death  ratios,  R^,  we  determined  by  the  method 
of  least  squares  the  areas  of  this  postulated  set  of 
frequency  curves.  In  the  case  of  the  B  series  we 
broke  up  the  empirically  constructed  compound 
death  curve  (the  d^  curve)  into  its  various  com- 
ponent parts  according  to  a  similar  classification 
of  causes  of  deaths  as  under  C.  We  have  therefore 
in  this  case  an  empirical  determination  of  the 
areas  of  the  component  curves  and  all  that  we 
need  to  do  is  to  graduate  the  rough  frequency 
diagrams  as  represented  by  such  areas  to  a  system 
of  frequency  curves. 

Let  us  now  briefly  examine  how  far  the  various 
skew  frequency  curves  in  series  B  and  C  differ 
from  each  other.  In  regard  to  the  various  statis- 
tical parameters  of  the  separate  groups  we  have 
the  folJowinef  results  : 


Means. 

Group 

Series  C 

Series  B 

I 

78.5 

75.0 

n 

68.0 

67.5 
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Group 

Series  G 

Series  B 

III 

63.0 

64.0 

IV 

60.5 

60.5 

V 

49.5 

50.0 

VI 

44.0 

43.5 

Vllb 

57.5 
Dispersions 

57.5      . 

Group 

Series  C 

Series  B 

I 

7.98 

9.78 

n 

12.21 

13.65 

111 

13.05 

14.12 

IV 

17.86 

16.51 

V 

18.51 

18.61 

VI 

14.68 

15.57 

Vllb 

12.16 
Skewness. 

16.33 

Group 

Series  G 

Series  B 

I 

+  0.092 

+  0.080 

II 

+  0.115 

+  0.117 

III 

+  0.121 

+  0.124 

IV 

+  0.098 

+  0.089 

V 

+  0.033 

+  0.026 

VI 

—0.010 

—0.036 

Vllb 

—0.002 

-0.027 

14* 
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Excess. 

Group 

Series  G 

Series  B 

I 

—0.033 

—0.005 

II 

+  0.023 

+  0.017 

III 

+  0.047 

+  0.030 

IV 

—0.009 

—0.006 

V 

0.031 

—0.034 

VI 

—0.027 

—0.023 

Vllb 

0.003 

—0.028 

Taken  all  in  all  there  is  found  to  exist  a  satis- 
factory agreement  between  the  hypothetical  va- 
lues in  series  C  and  the  values  derived  by  empiri- 
cal methods.  It  is  only  in  group  I  that  we  find 
some  important  discrepancies.  This  group  contains 
causes  of  death  typical  of  extreme  old  age  where 
we  naturally  may  expect  great  perturbations 
owing  to  large  errors  from  random  sampling, 
especially  in  series  B.  In  this  same  connection 
we  may  also  mention  that  the  empirically  deter- 
mined values  under  series  B  are  subject  to  a  slight 
correction  by  means  of  the  Sheperd  formulas, 
which  were  not  employed  in  my  computations. 

We  have  already  mentioned  that  the  system 
of  frequency  curves  which  we  choose  a  priori 
for  Massachusetts  (Series  C)  was  the  same  system 
which  we  had  used  on  a  previous  occasion 
in  the  construction  of  a  mortality  table  for  Eng- 
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lish  Males  for  the  period  1911—1912^).  This  is  a 
fact  of  no  small  importance.  It  will  in  general  be 
found  that  the  percentage  distribution  according 
to  age  in  the  various  component  curves  differs 
little  in  different  sample  populations.  Even  in  the 
case  of  American  Locomotive  Engineers  it  was 
found  possible  to  use  the  same  set  of  curves  as  in 
the  case  of  Massachusetts  and  England  and  Wales. 
In  the  same  way  I  have  found  that  the  set  of 
curves  used  in  the  construction  of  the  table  of 
Michigan  Males  also  can  be  used  in  the  case  of 
males  in  the  urban  population  of  Denmark,  With 
a  very  few  exceptions  I  have  found  it  possible 
to  get  along  with  a  limited  number  of  sets  of 
curves,  say  four  or  five  sets.  Should  it  never- 
theless prove  impossible  to  fit  the  original  data  to 
any  one  of  these  particular  curve  systems,  it  will 
in  most  cases  be  found  possible  by  means  of  suc- 
cessive approximations  to  reach  a  system  of  cur- 
ves which  may  be  made  the  a  priori  basis  for  the 
construction  of  the  final  table  as  was  the  case  in 
the  table  for  Japanese  assured  males. 

Finally  w^e  come  to  the  comparison  of  the  vari- 
ous areas  of  the  component  curves.  We  have 
here : 


*  See  "Proceedings  of  the  Casualty  Actuarial  Society 
of  America",  Vol.  IV,  page  409. 
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Areas. 

C 

B 

I 

90064 

105000 

II 

281470 

296190 

III 

207854 

213010 

IV 

151316 

144200 

V 

99543 

87850 

VI 

107718 

106260 

VII  &  VIII 

62035 

47410 

Total      1000000      1000000 

Evidently  the  agreement  is  not  so  close  in  this 
case.  But  it  would  indeed  be  rather  rash  to  assert 
that  the  values  in  series  G  are  faulty.  One  must 
here  bear  in  mind  the  diametrically  opposite 
principles  employed  in  the  determination  of  these 
areas.  In  series  B  we  have  a  direct  determination 
by  empirical  methods.  In  this  determination  we 
shall,  however,  find  reflected  all  the  original  sy- 
stematic and  observational  errors  originally  pre- 
sent in  series  A  from  which  the  curves  under  B 
were  computed.  Every  error  due  to  misstatements 
of  ages  and  systematic  errors  introduced  by  the 
summation  or  interpolation  formulas  will  be  di- 
rectly reflected  in  the  areas  under  series  B,  and 
such  areas  can  therefore  in  a  sense  only  be  con- 
sidered as  a  first  approximation  to  the  true  or 
presumptive  areas. 
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Another  point  well  worth  remembering  is  the 
one  that  no  conditions  are  imposed  upon  the  areas 
in  series  B.  In  series  C  where  we  work  with  mor- 
tuary records  only  we  have  on  the  other  hand  the 
very  important  condition  or  restriction  requiring 
that  the  areas  of  the  component  curves  must  be 
so  determined  that  their  ratios  to  the  compound 
curve  for  various  age  intervals  will  conform  as 
closely  as  possible  with  the  observed  proportionate 
death  ratios,  R^  ,  for  those  same  age  intervals. 

In  order  to  test  the  injluence  of  this  additional 
requirement  in  respect  to  conformity  to  observed 
proportionate  death  ratios  we  might  use  the  values 
of  the  component  curves  under  series  jB  as  a  first 
approximation  and  then  afterwards  determine  the 
correction  factors  a  for  the  areas  in  exactly  the 
same  way  as  in  the  case  of  series  C.  No  doubt 
such  a  calculation  would  tend  to  improve  the 
table. 

A  difficulty  occurs,  however,  in  the  case  of 
the  Massachusetts  data  owing  to  the  large  interval 
of  10  years  into  which  the  causes  of  death  by 
attained  ages  are  grouped.  As  pointed  out  in  the 
footnote  on  page  127  the  quantity  Rb(x),   (x  = 

30,  11,  12, 100;  B  =1,  II,  III, ), 

can  only  be  considered  as  being  independent  of 
the  "exposed  to  risk"  if  the  age  interval  into  which 
the  deaths  fall  is  sufficient! v  small.  If  this  is  not 
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the  case,  the  "central"  values  of  Rb  M  are 
subject  to  certain  corrections.  In  the  case  of  the 
groups  of  causes  of  death  typical  of  younger  ages 
the  observed  "central"  values  of  Ryn  M  and 
Ryiu  M  for  the  age  intervals  10 — 19,  20 — 29, 
SO — 39  are  evidently  too  high,  v^hile  on  the  other 
hand  the  values  of  Rj  (x)  and  Ru  M  in  the  case 
of  the  age  intervals  60—69,  70—79,  80—89, 
90 — 100  are  too  lov7  as  compared  with  the  true 
values  of  R(x)  at  these  "central"  ages.  I  have, 
however,  tacitly  ignored  this  fact  in  my  computa- 
tions. The  subsequent  result  is  that  the  final 
values  oi  qx  for  the  younger  ages  in  column  C  as 
shown  on  page  208  are  in  all  probability  a  little 
too  high,  and  the  values  oi  qx  above  65  too  low. 
In  the  case  of  the  other  tables  as  shown  in  the 
present  book  the  age  interval  into  which  the  causes 
of  death  were  arranged  was  5  years  or  less,  and 
the  error  was  thus  reduced  to  such  an  extent  that 
further  corrections  may  be  disregarded  for  all 
practical  purposes. 
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Showing    Detailed    Mortality    Tal)les    and    Death 
Curves  for 

1)  Japanese  Assured  Males  (1914—1917) 

2)  Metropolitan  Life.  White  Males  (1911—1916) 

3)  American  Coal  Miners  (1913—1917) 

4)  American  Locomotive  Engineers  (1913—1917) 

5)  Massachusetts  Males   (Series  C)   (1914—1916) 

6)  Michigan  Males  (1909—1915) 

7)  Massachusetts  Males  (Series  B)   (1914—1916). 
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Mortality  Table — Japanese  Assured  Males 
1914—1917  (Aggregate  Table) 


Age 

I 

II 

III 

IVa 

rVT) 

dx 

be 

lOOOqx 

15 

24 

65 

343 

2379 

2811 

1000000 

2.81 

16 

39 

74 

360 

3645 

4118 

997189 

4.13 

17 

43 

84 

388 

4888 

5403 

993071 

5.44 

18 

48 

93 

415 

5981 

6557 

987668 

6.64 

19 

54 

107 

446 

6826 

7433 

981111 

7.58 

20 

60 

120 

478 

7447 

8105 

973678 

8.32 

21 

68 

136 

513 

7716 

8432 

965573 

8.73 

22 

77 

163 

550 

12 

7734 

8626 

957141 

8.91 

23 

87 

171 

591 

27 

7581 

8467 

948616 

8.92 

24 

101 

195 

633 

50 

7274 

8253 

940168 

8.86 

25 

111 

218 

678 

77 

6864 

7948 

931906 

8.53 

26 

126 

246 

729 

112 

6384 

7597 

923957 

8.22 

27 

140 

278 

780 

153 

5860 

7211 

916360 

7.87 

28 

160 

315 

838 

206 

5341 

6860 

909149 

7.54 

29 

178 

353 

899 

268 

4821 

6519 

902289 

7.22 

30 

198 

395 

963 

341 

4323 

6220 

895770 

6.94 

31 

227 

446 

1033 

425 

3853 

6984 

889550 

6.73 

32 

252 

501 

1109 

521 

3421 

5804 

883566 

6.59 

33 

286 

557 

1185 

629 

3021 

5678 

877762 

6.46 

34 

319 

626 

1273 

751 

2665 

5633 

872084 

6.46 

35 

358 

700 

1364 

885 

2336 

5643 

866451 

6.51 

36 

401 

779 

1460 

1031 

2048 

6719 

860808 

6.64 

37 

450 

872 

1564 

1186 

1797 

5869 

855089 

6.86 

38 

502 

970 

1671 

1350 

1566 

6059 

849220 

7.13 

39 

570 

1081 

1791 

1524 

1366 

6332 

843161 

7.61 

40 

638 

1197 

1916 

1701 

1191 

6643 

836829 

7.94 

41 

716 

1332 

2049 

1883 

1037 

7017 

830186 

8.46 

42 

802 

1475 

2193 

2066 

903 

7439 

823169 

9.04 

43 

899 

1632 

2341 

2249 

783 

7904 

815730 

9.69 

44 

1006 

1799 

2501 

2428 

680 

8413 

807826 

10.41 

45 

1126 

1985 

2671 

2599 

598 

8979 

799413 

11.23 

46 

1261 

2180 

2852 

2764 

514 

9671 

790434 

12.10 

47 

1406 

2393 

3042 

2917 

447 

10205 

780863 

13.07 

48 

1675 

2611 

3236 

3061 

395 

10878 

770668 

14.12 

49 

1754 

2867 

3459 

3187 

339 

11606 

759780 

15.27 

60 

1957 

3122 

3666 

3298 

295 

12338 

748174 

16.49 

51 

2180 

3395 

3892 

3389 

257 

13113 

735836 

17.82 

52 

2426 

3679 

4136 

3473 

224 

13938 

722723 

19.29 

53 

2692 

3984 

4380 

3532 

195 

14783 

708785 

20.86 

64 

2987 

4285 

4638 

3576 

172 

15658 

694002 

22.56 

56 

3306 

4610 

4922 

3611 

147 

16596 

678344 

24.47 

56 

3654 

4940 

5177 

3612 

130 

17513 

661748 

26.46 

57 

4026 

5274 

5456 

3605 

113 

18474 

644235 

28.68 

58 

4432 

5603 

5742 

3581 

97 

19456 

625761 

31.09 

59 

4857 

5937 

6025 

3544 

84 

20447 

606306 

33.72 

60 

5316 

6257 

6316 

3498 

74 

21461 

585859 

36.63 

61 

5796 

6668 

6604 

3424 

69 

22460 

564398 

39.79 

62 

6293 

6860 

6890 

3345 

59 

23447 

541938 

43.27 

63 

6806 

7129 

7162 

3255 

51 

24402 

518491 

47.15 

64 

7332 

7361 

7423 

3150 

43 

25309 

494089 

51.22 

65 

7854 

7570 

7672 

3042 

38 

26176 

468780 

55.84 
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Age 

I 

II 

m 

IVa 

IVb 

dx 

Ix 

lOOOqx 

66 

8366 

7727 

7896 

2919 

36 

26944 

442604 

60.88 

67 

8863 

7838 

8089 

2791 

31 

27612 

415660 

66.43 

68 

9313 

7894 

8257 

2655 

28 

28147 

388048 

72.53 

69 

9719 

7894 

8385 

2511 

23 

28532 

359901 

79.27 

70 

10053 

7829 

8468 

2362 

20 

28732 

331369 

86.71 

71 

10294 

7700 

8503 

2212 

18 

28727 

302637 

94.92 

72 

10424 

7496 

8477 

2067 

15 

28479 

273910 

103.97 

73 

10424 

7227 

8389 

1901 

13 

27954 

245431 

110.69 

74 

10280 

6897 

8230 

1746 

13 

27166 

217477 

124.91 

76 

9970 

6503 

8002 

1593 

10 

26078 

190311 

137.02 

76 

9492 

6057 

7695 

1444 

10 

24698 

164233 

150.38 

77 

8834 

5571 

7313 

1298 

8 

23024 

139535 

165.00 

78 

8037 

5047 

6853 

1159 

7 

21103 

116511 

181.12 

79 

7086 

4499 

6314 

1026 

6 

18931 

95408 

198.42 

80 

6046 

3943 

5733 

900 

5 

16621 

76477 

217.33 

81 

4953 

3400 

5091 

784 

4 

14232 

59856 

237.77 

82 

3871 

2862 

4421 

676 

3 

11833 

45624 

259.35 

83 

2813 

2365 

3730 

577 

2 

9487 

33791 

280.75 

84 

1957 

1907 

3046 

489 

1 

7400 

24304 

304.48 

85 

1232 

1498 

2396 

412 

5538 

16904 

327.61 

86 

701 

1141 

1797 

340 

3979 

•  11366 

350.08 

87 

343 

844 

1275 

277 

2739 

7387 

370.76 

88 

140 

603 

844 

225 

1812 

4648 

389.78 

89 

48 

408 

516 

179 

1151 

2836 

405.85 

90 

14 

269 

283 

141 

707 

1685 

419.58 

91 

5 

171 

134 

110 

420 

978 

429.44 

92 

111 

53 

83 

247 

558 

442.65 

93 

56 

14 

63 

133 

311 

452.10 

94 

28 

4 

44 

76 

178 

457.05 

95 

14 

2 

31 

47 

102 

460.78 

96 

5 

1 

22 

28 

55 

509.01 

97 

14 

14 

27 

518.50 

98 

9 

9 

13 

692.30 

90 

4 

4 

4 

1000.00 

Mortality 

'  Table 

Metropolitan 

White  Males 

1911- 

-1916 

Age 

I 

II 

in 

IVb 

IVa 

dx 

Lx 

lOOOqx 

10 

80 

153 

205 

47 

1720 

2205 

1000000 

2.21 

11 

95 

179 

274 

61 

1776 

2385 

997793 

2.39 

12 

118 

210 

350 

77 

1812 

2567 

995410 

2.58 

13 

141 

244 

444 

96 

1832 

2757 

992843 

2.78 

14 

168 

282 

550 

116 

1834 

2950 

990086 

2.98 

15 

202 

327 

671 

140 

1825 

3165 

987136 

3.21 

16 

240 

373 

810 

171 

1803 

3397 

983971 

3.45 

17 

282 

427 

960 

199 

1772 

3640 

980574 

3.71 

18 

336 

483 

1130 

233 

1733 

3915 

976934 

4.01 

19 

393 

545 

1315 

274 

1680 

4207 

973019 

4.32 

20 

454 

611 

1514 

311 

1612 

4502 

968812 

4.65 

21 

527 

685 

1728 

358 

1539 

4837 

964310 

5.02 

22 

599 

765 

1951 

407 

1449 

5169 

959473 

5.39 

23 

687 

846 

2184 

459 

1363 

5538 

954304 

5.80 

220 
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Age 

I 

n 

III 

IVb 

IVa 

dx 

]X 

lOOOqx 

24 

775 

932 

2428 

515 

1279 

5929 

948766 

6.25 

25 

874 

1024 

2674 

575 

1190 

6337 

942837 

6.72 

26 

977 

1120 

2924 

638 

1107 

6766 

936500 

7.32 

27 

1088 

1223 

3173 

703 

1012 

7199 

929734 

7.74 

28 

1202 

1328 

3414 

770 

923 

7637 

922535 

8.28 

29 

1324 

1436 

3648 

839 

840 

8087 

914898 

8.84 

30 

1473 

1549 

3879 

909 

757 

8567 

906811 

9.45 

31 

1584 

1662 

4089 

985 

684 

9004 

898244 

10.02 

32 

1702 

1779 

4283 

1052 

614 

9430 

889240 

10.60 

33 

1863 

1899 

4459 

1125 

545 

9891 

879810 

11.24 

34 

2012 

2015 

4604 

1196 

485 

10312 

869919 

11.85 

35 

2160 

2139 

4740 

1266 

427 

10732 

859607 

12.48 

36 

2324 

2259 

4842 

1332 

378 

11135 

848875 

13.12 

37 

2485 

2379 

4919 

1399 

335 

11517 

837740 

13.75 

38 

2664 

2501 

4968 

1462 

296 

11891 

826223 

14.39 

39 

2847 

2617 

4989 

1520 

258 

12231 

814332 

15.02 

40 

3057 

2734 

4988 

1577 

226 

12578 

802101 

15.68 

41 

3272 

2848 

4953 

1628 

192 

12893 

789523 

16.33 

42 

3508 

2960 

4898 

1675 

163 

13204 

776630 

17.00 

43 

3767 

3066 

4821 

1719 

143 

13516 

763426 

17.70 

44 

4057 

3170 

4719 

1757 

120 

13823 

749910 

18.43 

45 

4389 

3267 

4604 

1789 

100 

14149 

736087 

19.22 

46 

4748 

3358 

4471 

1816 

90 

14483 

721938 

20.06 

47 

5153 

3447 

4320 

1839 

75 

14834 

707455 

20.97 

48 

5599 

3526 

4160 

1855 

61 

15201 

692621 

21.95 

49 

6064 

3598 

3991 

1867 

50 

15590 

677420 

23.01 

50 

6631 

3663 

3810 

1872 

42 

16018 

661830 

24.20 

31 

7198 

3721 

3630 

1872 

35 

16456 

645812 

25.48 

62 

7820 

3769 

3443 

1867 

30 

16929 

629356 

26.90 

53 

8492 

3809 

3254 

1857 

22 

17434 

612427 

28.47 

54 

9168 

3839 

3069 

1840 

10 

17926 

594993 

30.13 

55 

9897 

3858 

2876 

1820 

1 

18452 

577067 

31.98 

56 

10637 

3868 

2696 

1793 

18994 

558615 

34.00 

57 

11378 

3867 

2519 

1762 

19526 

539621 

36.18 

58 

12114 

3853 

2340 

1726 

20033 

520095 

38.52 

59 

12847 

3830 

2169 

1687 

20533 

500062 

41.06 

60 

13555 

3794 

2004 

1640 

20591 

479529 

43.77 

61 

14217 

3746 

1844 

1591 

21396 

358538 

46.67 

62 

14817 

3685 

1692 

1541 

21735 

437140 

49.72 

63 

15359 

3615 

1547 

1484 

22005 

415405 

52.97 

64 

15820 

3535 

1408 

1425 

22188 

393400 

56.40 

65 

16179 

3443 

1277 

1364 

22263 

371212 

59.97 

66 

16450 

3340 

1153 

1299 

22242 

348949 

63.74 

67 

16610 

3229 

1037 

1235 

22111 

326707 

67.68 

68 

16691 

3109 

930 

1166 

21896 

304596 

71.89 

69 

16591 

2981 

828 

1098 

21498 

282700 

76.05 

70 

16412 

2851 

736 

1030 

21029 

261202 

80.51 

71 

16107 

2711 

649 

955 

20422 

240173 

85.03 

72 

15721 

2568 

571 

892 

19752 

219751 

89.88 

73 

15225 

2423 

500 

825 

18973 

199999 

94.87 

74 

14629 

2271 

434 

759 

18093 

181026 

99.95 

75 

13946 

2126 

377 

695 

17144 

162933 

105.22 

76 

13225 

1976 

325 

632 

16158 

145789 

110.83 

77 

12423 

1828 

278 

572 

15101 

129631 

116.49 

78 

11580 

1684 

237 

515 

14016 

114530 

122.38 
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Age 

I 

II 

III 

IVb   IVa 

dx 

Ix 

lOOOqx 

79 

10729 

1543 

200 

461 

12933 

100514 

128.67 

80 

9840 

1406 

167 

411 

11824 

87581 

135.01 

81 

8950 

1272 

138 

363 

10723 

75757 

141.54 

82 

8092 

1144 

115 

318 

9669 

65034 

148.68 

83 

7237 

1024 

98 

282 

8641 

55365 

156.07 

84 

6420 

911 

79 

247 

7657 

46724 

163.88 

85 

5645 

806 

65 

208 

6724 

39067 

172.11 

86 

4920 

707 

53 

181 

5861 

32343 

181.21 

87 

4240 

615 

43 

150 

5048 

26482 

190.62 

88 

3622 

531 

34 

126 

4313 

21434 

201.22 

89 

3065 

457 

27 

106 

3655 

17121 

213.48 

90 

2550 

387 

22 

87 

3046 

13466 

226.20 

91 

2099 

327 

16 

70 

2512 

10420 

241.07 

92 

1698 

270 

14 

56 

2038 

7908 

257.71 

93 

1355 

222 

11 

45 

1633 

5870 

278.19 

94 

1053 

179 

8 

35 

1275 

4237 

300.92 

95 

805 

143 

6 

27 

981 

2962 

331.20 

96 

595 

112 

5 

20 

732 

1981 

369.51 

97 

412 

85 

1 

14 

512 

1249 

409.93 

98 

286 

62 

10 

358 

737 

485.75 

99 

198 

27 

6 

231 

379 

609.50 

100 

95 

15 

4 

114 

148 

770.27 

101 

27 

5 

2 

34 

34 

1000.00 

Mortality  Table — American  Coal  Miners 
(1913—1917) 


Age 
18 
19 


27 
28 
29 
SO 
SI 
S2 
88 
34 
35 
36 
37 


II 

III 

IV 

Va 

Vb 

VI 

dx 

Ix 

lOOOqx 

99 

124 

142 

4566 

7 

366 

5304 

1000000 

5.30 

114 

144 

164 

4702 

10 

408 

5542 

994696 

5.57 

140 

168 

187 

4954 

14 

452 

5915 

989154 

5.98 

162 

194 

214 

5196 

19 

498 

6283 

983239 

6.39 

190 

223 

243 

5234 

27 

546 

6463 

976956 

6.62 

223 

250 

272 

5151 

38 

597 

6531 

970493 

6.73 

256 

282 

307 

5067 

50 

646 

6608 

963962 

6.86 

298 

315 

341 

4952 

69 

697 

6672 

957354 

6.97 

341 

349 

379 

4846 

91 

749 

6755 

950682 

7.11 

390 

386 

421 

4748 

120 

802 

6867 

943927 

7.27 

440 

424 

465 

4683 

156 

853 

7021 

937060 

7.49 

498 

461 

508 

4569 

202 

903 

7141 

930039 

7.68 

557 

500 

560 

4413 

257 

953 

7240 

922898 

7.84 

622 

538 

609 

4220 

326 

1002 

7317 

915658 

7.99 

688 

579 

663 

4000 

408 

1048 

7386 

908341 

8.13 

761 

618 

718 

3757 

505 

1093 

7452 

900955 

8.27 

837 

654 

777 

3500 

618 

1133 

7519 

893503 

8.42 

915 

693 

840 

3233 

749 

1175 

7605 

885984 

8.58 

994 

732 

905 

2963 

898 

1212 

7704 

878379 

8.77 

1084 

775 

973 

2697 

1064 

1246 

7839 

870675 

9.00 

1171 

818 

1045 

2435 

1251 

1277 

7997 

862836 

9.27 

1267 

867 

1124 

2184 

1452 

1305 

8199 

854839 

9.59 

1364 

920 

1206 

1946 

1667 

1329 

8432 

846640 

9.96 

40 

41  1471   978  1293  1723  1894  1352   8711  838208    10.39 

42  1581  1045  1386  1515  2131  1369   9027  829497    10.88 


22$ 

2 
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Age 

I 

II 

III 

IV 

Va 

Vb 

VI 

dx 

Ix 

lOOOqx 

43 

1705 

1125 

1489 

1325 

2372 

1383 

9399 

820470 

11.46 

44 

1835 

1222 

1585 

1106 

2609 

1395 

9752 

811071 

12.02 

45 

1 

1976 

1322 

1712 

883 

2841 

1403 

10133 

801319 

12.65 

46 

6 

2132 

1444 

1837 

853 

3063 

1408 

10743 

791186 

13.68 

47 

10 

2302 

1584 

1971 

729 

3265 

1410 

11271 

780443 

14.44 

48 

21 

2492 

1741 

2114 

619 

3443 

1408 

11838 

769172 

15.39 

49 

32 

2705 

1918 

2265 

524 

3595 

1402 

12441 

757334 

16.43 

50 

42 

2934 

2118 

2423 

442 

3706 

1395 

13060 

744893 

17.53 

51 

54 

3190 

2337 

2589 

368 

3790 

1383 

13711 

731833 

18.74 

52 

73 

3470 

2567 

2764 

307 

3832 

1368 

14380 

718122 

20.02 

53 

94 

3775 

2820 

2945 

255 

3832 

1352 

15073 

703742 

21.42 

54 

123 

4104 

3086 

3130 

210 

3790 

1331 

15774 

688669 

22.91 

55 

153 

4437 

3355 

3313 

173 

3706 

1308 

16445 

672895 

24.44 

56 

185 

4843 

3637 

3501 

141 

3595 

1281 

17183 

656450 

26.18 

57 

225 

5246 

3922 

3689 

115 

3443 

1252 

17892 

639267 

27.99 

58 

268 

5656 

4192 

3872 

93 

3265 

1220 

18566 

621375 

29.88 

59 

310 

6085 

4454 

4047 

76 

3063 

1186 

19221 

602809 

31.89 

60 

354 

6530 

4703 

4209 

61 

2841 

1148 

19846 

583588 

34.01 

61 

402 

6970 

4936 

4364 

48 

2609 

1109 

20438 

563742 

36.25 

62 

450 

7403 

5133 

4500 

39 

2372 

1076 

20964 

543304 

38.69 

63 

508 

7832 

5305 

4618 

30 

2131 

1023 

21447 

522340 

41.05 

64 

573 

8230 

5438 

4718 

24 

1894 

978 

21855 

500893 

43.63 

65 

648 

8615 

5533 

4795 

19 

1667 

931 

22208 

479038 

46.36 

66 

746 

8954 

5581 

4846 

15 

1452 

884 

22478 

456830 

49.20 

67 

875 

9255 

5596 

4871 

13 

1251 

834 

22695 

434352 

52.25 

68 

1015 

9507 

5563 

4871 

9 

1064 

785 

22814 

411657 

55.41 

69 

1207 

9704 

5479 

4841 

6 

898 

736 

22871 

388843 

58.81 

70 

1437 

9846 

5358 

4786 

6 

749 

686 

22868 

365972 

62.49 

71 

1702 

9917 

5196 

4701 

4 

618 

637 

22775 

343104 

66.38 

72 

2008 

9931 

4999 

4592 

4 

505 

588 

22627 

320329 

70.64 

73 

2334 

9871 

4771 

4460 

2 

408 

540 

22386 

297702 

75.20 

74 

2677 

9747 

4513 

4302 

2 

326 

494 

22061 

275316 

80.10 

75 

3028 

9557 

4233 

4125 

2 

257 

449 

21651 

253255 

85.49 

76 

3332 

9307 

3941 

3929 

1 

202 

408 

21120 

231604 

91.19 

77 

3610 

9001 

3638 

3722 

1 

156 

366 

20494 

210484 

97.37 

78 

3827 

8643 

3322 

3496 

120 

329 

19737 

189990 

103.88 

79 

3967 

8237 

3012 

3267 

91 

293 

18867 

170253 

110.82 

80 

4020 

7799 

2704 

3029 

69 

258 

17879 

151386 

118.10 

81 

3980 

7327 

2411 

2788 

50 

226 

16782 

133507 

125.70 

82 

3916 

6803 

2123 

2552 

38 

198 

15630 

116725 

133.90 

83 

3658 

6315 

1846 

2313 

27 

171 

14330 

101095 

141.75 

84 

3370 

5801 

1596 

2085 

19 

147 

13018 

86765 

150.04 

85 

3040 

5286 

1366 

1862 

14 

125 

11693 

73747 

15856 

86 

2684 

4776 

1151 

1650 

10 

105 

10376 

62054 

167.21 

87 

2305 

4281 

957 

1448 

7 

88 

9086 

51678 

175.82 

88 

1937 

3809 

789 

1261 

5 

71 

7872 

42592 

184.82 

89 

1584 

3353 

640 

1085 

3 

60- 

6725 

34720 

193.69 

90 

1269 

2924 

513 

927 

2 

48 

5683 

27995 

203,00 

91 

985 

2535 

404 

784 

2 

38 

4748 

22312 

212.80 

92 

747 

2168 

310 

650 

1 

29 

3905 

17564 

222.33 

94 

551 

1845 

231 

531 

22 

3180 

13659 

232.81 

94 

396 

1545 

170 

428 

17 

2556 

10479 

243.92 

95 

278 

1279 

119 

338 

12 

2026 

7923 

255.71 

96 

198 

1050 

79 

261 

7 

1594 

6897 

270.31 

97 

126 

845 

48 

195 

5 

1219 

4303 

283.2& 
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A(?e 

I 

II 

98 

85 

672 

99 

70 

525 

100 

35 

401 

101 

24 

298 

102 

19 

217 

103 

14 

149 

104 

10 

97 

105 

8 

55 

106 

6 

25 

107 

3 

2 

108 

2 

109 

1 

III       IV        Va       Vb       VI 

26      140  2 

9         96 

59 

29 

4 


dx 

Ix 

lOOOqx 

925 

3084 

299.94 

701 

2159 

324.69 

495 

1458 

339.51 

351 

963 

364.48 

240 

612 

392.16 

163 

372 

438.17 

107 

209 

511.96 

63 

102 

727.65 

37 

39 

794.87 

5 

8 

625.00 

2 

3 

666.67 

1 

1 

1000.00 
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ADDENDA  II 


In  order  to  show  a  rapid  application  of  frequency 
curve  methods  to  the  graduation  of  mortality  teibles 
when  the  numher  of  lives  exposed  to  risk  at  various 
ages  is  known,  the  following  data,  relating  to  appli- 
cants who  had  been  rejected  for  life  assurance  on 
account  of  impaired  health,  by  Scandinavian  assur- 
ance companies  is  instructive.  The  original  stati- 
stics as  collected  by  a  committee  of  the  insurance 
companies  were  first  published  in  the  quinquennial 
report  (1910 — 1915)  of  the  Danish  Government  Life 
Assurance  Institution  (The  Statsanstalt)  for  1917. 

The  material  related  to  Scandinavian  and  Finnish 
applicants  who  previously  to  1893  (and  in  the  case 
of  two  Danish  companies  before  1899)  had  been  re- 
jected for  life  assurance.  By  a  special  investigation, 
the  committee  followed  up  these  rejections  and  sought 
to  establish  whether  the  applicants  were  alive  at  July 
1,  1899,  or  were  previously  deceased.  Detailed  re- 
ports for  the  full  period  during  which  the  risks  were 
under  observation  were  available  for  8,208  individual 
applicants.  For  2,023  apphcants  complete  data  were 
not  available. 

The  fineQ  statistical  results  of  the  Statsanstalt's  in- 
vestigation are  shown  in  the  following  summary 
table: 
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TABLE  I. 

Mortuary  Experience  of  Rejected  Risks  of  Scandi- 

navian 

Life  Companies. 

Attained 

No.  Exposed 

Number 

Age 

to  Risk 

of  Deaths 

15-19 

434 

6 

20-24 

3,831 

28 

25-29 

11,405 

145 

30-34 

17,644 

233 

35-39 

19,442 

318 

40-44 

17,600 

324 

45-49 

13,971 

296 

50-54 

10,179 

295 

55-59 

6,640 

264 

60-64 

3,927 

194 

65-69 

1,995 

96 

70-74 

836 

71 

75-79 

306 

32 

80-84 

98 

20 

85-89 

12 

3 

The  exposed  to  risk  by  separate  £Lges  and  the 
correlated  deaths  are  shown  in  Table  II  in  Columns 
2  and  3,  from  which  we,  without  difficulty,  obtain  the 
crude  or  ungraduated  mortality  rates,  as  shown 
Column  4. 

We  next  assume  a  purely  hypothetical  frequency 
distribution  of  the  exposed  to  risk,  according  to  age, 
represented  by  a  Laplacean  normal  probability  curve 
with  its  mean  or  origin  at  age  fifty  and  a  dispersion 
equal  to  12.5  years,  as  shown  in  Column  5.  The  fre- 
quency distribution  of  the  number  of  deaths  on  the 
basis  of  the  ungraduated  mortality  rates  in  Column  4 
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and  the  above-mentioned  normal  probability  curve  is 
shown  in  Column  6,  which  may  be  considered  as  an 
ungraduated  compound  frequency  curve.  ^ 

Arranged  in  quinquennial  age  intervals  this  latter 
frequency  distribution  is  shown  in  the  following  sum- 
mary table: 


Ages 

No.  of  Deaths 

13-17 

51 

18-22 

75 

•23-27 

329 

28-32 

711 

33-37 

1,464 

38-42 

2,498 

43-47 

3,649 

48-52 

5,377 

53-57 

6,238 

58-62 

6,232 

63-67 

5,254 

68-72 

3,605^ 

73-77 

2,536 

78-82 

1,425 

83-87 

1,169 

88-92 

351 

93  or  over 

95 

Total . . 

.  41,059 

The  above  frequency  distribution  is  now  subjected 
to  a  graduation  by  means  of  the  Laplacean — Charlier 
or  Gram — CharUer  frequency  function.  The  mathe- 
matical calculations  give  the  following  parameters: 


*  A  slight  adjustment  was  made  in  the  figures  in  column  (6)  corres- 
ponding to  age  70,  and  in  the  age  groups  above  the  age  of  88. 
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Mean  Age  57.75  years 

Dispersion  13.32  years 

Skewness  —0.0031 

Excess  —0.0037 

.\pplying  these  parameters  to  standard  probaJjility 
tables  we  obtain  the  usuaJ  Laplacean — Charher  fre- 
quency curve.  Distributing  the  41,059  individual 
deaths  according  to  this  frequency  curve  we  obtain 
column  (7)  which  is  the  graduated  death  curve  cor- 
responding to  the  hypothetical  exposure  as  given  by 
column  (5).  The  final  mortality  rates  per  1,000  of 
exposed  to  risk  are  then  found  by  dividing  (7)  with 
(5)  and  are  shown  in  column  (8). 

In  order  to  show  how  close  the  graduation  by 
means  of  frequency  curves  agrees  with  the  actual 
observations,  I  have  made  a  calculation  of  the 
"actual"  to  the  "expected"  deaths  by  quinquennial 
age  intervals  as  shown  in  the  following  table: 

TABLE  III. 

Comparison    between    "ActuaF    and    ''Expected'' 

Deaths  on  the  Basis  of  the  Graduated  Mortality 

Rates  of  the  Scandinavian  Mortality  Table  for 

Rejected  Lives 


No.  Exposed 

Actual 

Expected 

Ages 

to  Risk 

Deaths 

Deaths 

15-19 

434 

6 

3.4 

20-24 

3,831 

28 

37.6 

25-29 

11,405 

145 

133.4 

30-34 

17,644 

233 

242.2 

35-39 

19,442 

318 

314.3 

40-44 

17,600 

324 

336.8 
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Ages 

45-49 
50-54 
55-59 
60-64 
65-69 
70-74 
75-79 
80-84 
85-89 
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No.  Exposed            Actual 

Expected 

to  Risk                Deaths 

Deaths 

13,971                   296 

321.8 

10,179             295 

287.2 

6,640              264 

234.8 

3,927              194 

178.6 

1,995               96 

119.5 

836               71 

67.4 

306               32 

33.8 

98               20 

15.1 

12                 3 

2.5 

Total  108,320  2,325  2,328.4 

Considering  the  somewhat  meager  experience  on 
which  the  graduation  was  based,  I  think  it  must  be 
admitted  that  the  method  of  frequency  curves  comes 
surprisingly  close  to  the  actual  facts.  In  this  connec- 
tion it  is  of  interest  to  note  that  the  actuaries  of  the 
Danish  Statssinstalt  made  a  graduation  of  the  above 
data  on  the  basis  of  Maikeham's  method  and  obtained 
from  least  square  methods  the  following  values  for 
the  constants.  ^ 

A  =  0.006 

log  B  =  7.0566  —  10 

log  C  -  0.025 

The  "  expected"  deaths  according  to  this  latter 
graduation,  and  on  the  basis  of  the  above  experience, 
amount  in  total  to  2,317  a^  against  2,325  "actual" 
deaths  and  2,328  "  expected''  deaths  according  to  the 
frequency    curve   method.     Viewed   from   the   stand- 


*  See  formula  (6)  page  192  of  Institute  of  Actuaries  Text  Book.    Life 
Contingencies  by  E.  F.  Spurgeon,  London,  1922. 
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point  of  the  principle  of  least  squares  it  is  also  found 
that  the  sum  of  the  squares  of  the  deviations  is  smal- 
ler under  the  frequency  curve  method  than  under  the 
method  of  Makeham,  which  seems  to  be  pretty  good 
evidence  of  the  soundness  of  the  method  in  spite  of 
the  fact  that  I  throughout  have  worked  with  un- 
weighted observations.  If  properly  chosen  weights 
were  apphed  to  the  observations  even  closer  results 
could  be  obtained. 


TABLE  II. 

Mortality  Eooperience  of  Rejected  Scandinavian  Risks 
(Male). 


(5) 

(6) 

(7) 

Graduated 

Death 

Curve 

/I  ^ 

(2) 

(3) 

(4) 
(3)  :  (2) 

Hypo- 

(5) X  (4) 

(8) 

Ana 

Exposed 

No.  of 

thetical 

Crude 

(7)  :  (5) 

age 

to  Risk 

Deaths 

EXJK)- 

sure 

Death 
Curve 

lOOOqx 

15 

11 

0 

0.00000 

792 

0 

5.6 

7.07 

16 

31 

1 

0.03226 

987 

32 

7.1 

7.07 

17 

64 

1 

0.01562 

1223 

19 

9.2 

7.52 

18 

121 

0 

0.00000 

1506 

0 

11.7 

7.77 

19 

207 

4 

0.01932 

1842 

3 

15.4 

8.36 

20 

340 

1 

0.00294 

2239 

7 

19.7 

8.80 

21 

501 

1 

0.00200 

2705 

5 

25.0 

9.24 

22 

719 

6 

0.00834 

3246 

27 

30.8 

9.49 

23 

982 

6 

0.00611 

3871 

24 

38.8 

10.02 

24 

1289 

14 

0.01086 

4586 

50 

47.8 

10.42 

25 

1619 

22 

0.01359 

5399 

73 

58.2 

10.78 

26 

1986 

23 

0.01158 

6316 

73 

70.6 

11.18 

27 

2287 

34 

0.01487 

7341 

109 

85.0 

11.58 

28 

2597 

29 

0.01117 

8478 

95 

101.7 

12.00 

29 

2916 

37 

0.01269 

9728 

123 

120.5 

12.39 

30 

3180 

38 

0.01195 

11092 

133 

142.0 

12.80 

31 

3395 

50 

0.01473 

12566 

185 

166.4 

13.24 

32 

3564 

44 

0.01235 

14146 

175 

193.5 

13.68 

33 

3700 

46 

0.01243 

15822 

197 

223.4 

14.12 

34 

3806 

55 

0.01445 

17585 

254 

257.0 

14.61 

35 

3882 

48 

0.01236 

19419 

240 

293.3 

15.10 

36 

3943 

64 

0.01623 

21307 

346 

332.8 

15.62 

37 

3921 

72 

0.01836 

23230 

427 

375.3 

16.16 

38 

3880 

66 

0.01701 

25164 

428 

420.0 

16.69 

39 

3816 

68 

0.01782 

27086 

483 

467.7 

17.27 

40 

3737 

66 

0.01766 

28969 

512 

517.6 

17.87 

41 

3637 

63 

0.01732 

30785 

533 

566.9 

18.41 

240 
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(5) 

(6) 

(7) 

Graduated 

Death 

C^irve 

igl 

(2) 
Exposed 

(3) 
No.  of 

(4) 
(3)  :  (2) 

Hypo- 
thetical 

(5)  X  (4) 
Crude 

(8) 
(7)  :  (5) 

Age 

to  Risk 

Deaths 

Expo- 
sure 

Death 
Curve 

lOOOqx 

42 

8539 

59 

0.01667 

32506 

542 

623.3 

19.17 

43 

3426 

62 

0.01810 

34105 

617 

678.2 

19.89 

44 

3261 

74 

0.02269 

35553 

807 

732.7 

20.61 

45 

3079 

67 

0.02176 

36827 

801 

787.8 

21.39 

46 

2941 

61 

0.02074 

37903 

786 

842.4 

22.23 

47 

2793 

46 

0.01647 

38762 

638 

895.1 

22.97 

48 

2653 

61 

0.02299 

39387 

906 

945.9 

24.02 

49 

2505 

61 

0.02435 

39767 

968 

994.3 

25.00 

50 

2348 

61 

0.02598 

39894 

1036 

1039.0 

26.04 

51 

2184 

65 

0.02976 

39767 

1183 

1079.9 

27.16 

62 

2024 

66 

0.03261 

39387 

1284 

1116.0 

28.33 

53 

1882 

59 

0.03135 

38762 

1215 

1147.4 

29.53 

54 

1741 

44 

0.02527 

37903 

958 

1173.3 

30.96 

55 

1610 

62 

0.03851 

36827 

1418 

1193.0 

32.39 

56 

1447 

60 

0.04147 

35553 

1474 

1206.9 

33.95 

57 

1308 

45 

0.03440 

34105 

1173 

1214.3 

35.60 

58 

1189 

47 

0.03953 

32506 

1285 

1214.9 

37.37 

59 

1086 

50 

0.04604 

30785 

1417 

1209.0 

39.27 

60 

966 

44 

0.04555 

28969 

1320 

1197.0 

41.32 

61 

871 

35 

0.04019 

27186 

1089 

1178.8 

43.52 

62 

786 

35 

0.04453 

25164 

1121 

1154.2 

45.87 

63 

701 

44 

0.06277 

23230 

1458 

1124.6 

48.41 

64 

603 

36 

0.05970 

21307 

1272 

1090.1 

51.16 

65 

518 

22 

0.04247 

19419 

825 

1050.7 

54.11 

66 

453 

24 

0.05298 

17585 

932 

1006.3 

57.22 

67 

392 

19 

0.04847 

15822 

767 

960.1 

60.68 

68 

340 

16 

0.04706 

14146 

666 

909.6 

64.30 

69 

291 

15 

0.05155 

12566 

648 

858.4 

68.31 

70 

244 

25 

0.10246 

11092 

1136 

804.2 

72.50 

71 

193 

17 

0.08808 

9728 

857 

750.9 

77.19 

72 

158 

13 

0.08228 

8478 

698 

695.7 

82.06 

73 

132 

9 

0.06818 

7341 

501 

642.4 

87.51 

74 

109 

7 

0.06422 

6316 

406 

589.1 

93.27 

75 

91 

8 

0.08791 

5399 

475 

537.7 

99.59 

76 

74 

10 

0.13514 

4586 

620 

486.8 

106.15 

77 

58 

8 

0.13793 

3871 

534 

440.3 

113.74 

78 

45 

4 

0.08889 

3246 

289 

393.8 

121.32 

79 

37 

2 

0.05405 

2705 

146 

351.9 

130.09 

80 

31 

5 

0.16129 

2239 

361 

311.8 

139.26 

81 

24 

6 

0.25000 

1842 

461 

274.5 

149.02 

82 

18 

2 

0.11112 

1506 

168 

241.6 

160.42 

83 

15 

4 

0.26667 

1223 

326 

209.5 

171.30 

84 

9 

3 

0.33334 

987 

329 

181.5 

183.89 

85 

6 

2 

0.33334 

792 

264 

155.9 

196.84 

86 

3 

0 

0.00000 

631 

000 

133.4 

211.41 

87 

2 

1 

0.50000 

499 

250 

113.4 

227.26 

88 

2 

1 

0.50000 

393 

197 

95.5 

243.00 

89 

0.5 

0 

0.50000 

307 

154 

79.2 

257.98 

Note: — ^The  observations  above  age  87  are  not  reliable. 
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Semi-invariants.      definition 
of.  12. 
Computation  of.  4  6 — 4  8. 
General  properties  of.   16. 

Sheppard      corrections      for 
adjusted    moments,    80. 

Standard    deviation,   40. 

Statistical      series,        homo- 
g-rade.  94. 

Sum     products,     homog-ene- 
ous.    56. 

Symmetric  functions.   38. 
Parameters  viewed  as,  11. 

Taylor   series,    2.    3. 

Thiele.   1.   12.   19,   38.   61,   69. 
72.    122. 

Thompson.   John   S.,   182. 

Transformation. 

General   theory   of.    70. 
Linear.    4  5,    62.    101. 
Log-arithmic,   72,   74,   77, 

82.  87. 
Of  variates.   101 — 104. 

Westereraard.  119. 

Wicksell,  72. 

Yano.   T.,   180. 
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